Remove Vocals from Songs Perfectly with Artificial Intelligence (AI)

I started a small recording studio back in the year 2008. During that time, asides from working on music and audio recording, I also extended to audio editing services.

An audio job I often got then, was to remove vocals from music. My clients were people who wanted to turn their favorite songs into karaoke versions so that they can sing on top of it.

Back then, to remove vocals from music, I either used Audacity’s Vocal Reduction and Isolation effect or an audio phasing cancellation technique. The end result (listen below) was often not that great. This was because the vocal remover effects depended on frequencies and center-panned audio, to work (most tracks have the vocals mixed to its center).

These days, with the existence of AI technologies, it’s far more efficient to use an AI-based vocal remover like PhonicMind, which uses an AI audio engine to separate vocals and instruments from any song.

In this post, I’ll show you how to remove vocals from a song using the dated, conventional method using audio effects. Then we’ll compare the results using PhonicMind, an AI-based vocal remover.

Let’s get started!

Remove Vocals using Audio Effects (In Adobe Audition)

Many music producers or karaoke enthusiasts, still use this old method, which I’m about to show you, to remove or isolate vocals from songs. Well, I believe it’s simply because they aren’t aware of the new AI-based tools we have today.

Anyway, here’s how it works.

Using the Center-Channel Extractor (Vocal Remove) Effect

Most audio editors like Adobe Audition or Audacity would have an effect to remove vocals or to isolate them as acapellas. On Adobe Audition, this is called the Center-channel Extractor effect, and on Audacity, it’s labeled as the Vocal Reduction and Isolation effect.

vocal-reduction-and-isolation Vocal Reduction and Isolation (actually a Center-Channel Extractor) effect in Audacity.

Generally, the effect works by removing center-panned audio, because that’s where vocals are usually mixed to. Then using the frequency dials, you can choose to filter a frequency range, to target frequencies belonging to vocals, as closely as possible.

center-channel-extractor

Center-channel extractor in Adobe Audition

In the example, I used Adobe Audition’s Center-Channel Extractor and chose the ‘Vocal Remove’ preset.

Notice that the effect actually attenuates the center channel level by -40dB. The frequency range selects the range of the vocal frequencies in the song to affect. The default range is selected from 120Hz to 20,000Hz. This means bass frequencies like kick drums and bass, below 120Hz, are not attenuated.

You can preview as you adjust the parameters. And once you like what you hear, just click on Apply and the effect will process your song.

To make an acapella, you simply do the opposite. Instead of attenuating the center channel, you attenuate the side channels instead. This leaves you with vocals in the center of the track.

On Adobe Audition, launch the Center Channel Extractor and choose the ‘Acapella’ preset. Now, notice the preset chooses to attenuate the Side Channel Level instead of the Center Channel Level. This removes the side channels and keeps the vocals sitting in the center.

side-channel-extraction-audition

Watch the video below to see a walkthrough of using Adobe Audition to remove vocals and creating acapellas:

But I still hear some vocals!

And you’re right.

Unfortunately, this is a limitation if we were to use vocal remover audio effects that rely on frequency selection and center-channel targeting to remove vocals. Musical elements within a music piece often have frequency overlaps.

So despite the amount of EQ-ing and filtering you do, you’ll never get a clean output.

Removing Vocals from a Song with Artificial Intelligence (AI)

These days, with the democratization of AI and machine learning technologies, we have access to AI vocal removers and audio stem makers. This is where AI vocal remover like PhonicMind comes, giving you results that conventional audio effects would never achieve.

PhonicMind is, in fact, more than just a vocal remover, it’s also an audio stems maker. The audio engine separates the song you upload into 4 audio stems, vocals, drums, bass, and other instruments.

Without going too deep into details, the way the AI audio engine works is by first listening and understanding music. Using deep learning, it learns by listening to music every day at a speed of 20 minutes of music per second. This gave birth to an audio source separation technology, based on AI that understands music – and thus, can achieve stem separation at a never ‘heard’ before quality.

Separating vocals, drums, and other instruments from a song amazingly just takes less than a minute. Let’s look at using PhonicMind to remove vocals and create audio stems.

How to Remove Vocals with AI using PhonicMind.

Head over to PhonicMind.com and sign up for a free account.

vocal-remover-phonic-mind

Click on ‘Upload’ and drop in a song in a high-quality audio format. It’s preferable to use a lossless audio format such as .WAV, .AIFF or .FLAC, to get the best end results.

The AI audio engine will take less than a minute to process the song. Once done, you’ll be redirected to a page with a multitrack audio player to preview the audio stems.

stems-maker-phonic-mind

If you’re happy with the results, just click on perform full conversion. It’ll cost you $1.99 to perform a conversion for a song when you buy the 10 song bundle. You can get to as low as $1.49 a song with the 20 song bundle.

Once you performed a song conversion, you can download it as many times as you like. PhonicMind provides you with several download options, including downloading a karaoke version, vocals only, or all the stems.

And that’s it! Can you believe it’s that quick and easy?

The sound quality you get with an AI-based vocal remover like PhonicMind is better by leaps and bounds. Watch a walkthrough video or using PhonicMind and listen to the sound quality.

Is this legal?

This is an important point that I want to mention in this post.

PhonicMind is simply an audio processor that allows you to perform audio source separation. Just because you can remove vocals from any song, using an AI-stems maker like PhonicMind, doesn’t make it legally right to do so.

Although generally, you shouldn’t have anything to worry about if the tracks are for personal use. However, if you’re planning on republishing the acapellas or samples from a song you’ve processed, then you must get a mechanical license from the original song owner.

Which is the best AI vocal remover?

Having used many of the solutions available, such as Spleeter, LALAL.Ai, and PhonicMind, I’ll be upfront to tell you PhonicMind beats every AI vocal remover solution in the market.

Here are my thoughts on each of them:

1 – PhonicMind

As I’ve mentioned earlier, PhonicMind isn’t just a vocal remover, it’s a stems maker (more on stems below) that can separate a song into 4 audio stems. This makes the tool so much more useful, compared to other solutions on the market.

phonicmind stems maker

Why do I say PhonicMind has the best audio engine?

If you watch the video above, you’ll see me use PhonicMind to separate a song into 4 audio stems, before importing them into a DAW and playing them. What’s crazy – is that the 4 audio stems played back together as a multitrack, sound exactly like the original song!

It’s not the same with the other audio engines like Spleeter. You’ll hear that Spleeter silences musical elements that it doesn’t recognize during the audio separation process.

What are STEM Files?

PhonicMind also outputs STEM files, a unique format developed by Native Instruments that contains four individual stems of a track in one file. Using a .stem.mp4 file extension, the master version (full track) can be played in stereo with an audio player like iTunes as an mp4 file.

STEM files are usually used in compatible DJ software like Traktor Pro or DJ like the Traktor Kontrol S8, enabling DJs to mix individual audio stems on the fly.

Watch the video below to learn about STEM Files:

2 – Spleeter

Spleeter is not actually an app. Rather, it’s an audio source separation library that has been released as an open-source code. To use it, you would need programming knowledge in Python and Tensorflow.

spleeter-by-deezer

Unless you know a bit of programming, I wouldn’t bother using Spleeter. The time I tried to get it to work, I spent near half a day to figure things out, only to get audio stems that weren’t that great either.

To me, it simply isn’t worth the hassle.

Sure, there are many projects created by hobbyists, building on top of Spleeter’s audio engine which you can conveniently use. However, with the lack of active development on its AI engine, the quality you get out of is lacking, when compared to an AI audio engine that has constant development, like PhonicMind.

3 – LALAL.Ai

LALAI AI is another AI-powered vocal remover app with a unique name. While it works alright for removing vocals, it’s not a stems maker, meaning you can only separate a song into two tracks – vocals and instrumentals.

lalal ai

This is a downside, given that LALAL AI only functions as a vocal remover. You’re out of luck if you’re hoping to do resampling or some creative production work.

The company also claims to be the world’s best audio splitting engine that is powered by the world’s #1 AI-powered technology. However, #1 on whose’s terms?

When doing research for this post, I came across many postings on forums, comparing LALAI AI with other AI vocal removers. What ticks me off, is that a lot of these posts were made by LALAI AI’s own marketing team. To me, I would rather see the team spending more time on improving their audio engine, rather than trying to influence people who are searching for reviews.

Is it worth using a paid vocal remover?

To me, using a paid vocal remover (I use PhonicMind) is worth every cent. Here’s why:

Saves you time.

How much do you value your time? Our time is limited and therefore, time is money.

Successful people spend money to buy back time – by outsourcing work, hiring people and using tools that get work done for them. They can then focus on working on the bigger things in their work and life.

Sure, you can spend the next 2-3 hours trying to remove vocals from a song using conventional audio effects. But is that 2-3 hours of your time only worth $1.99?

It is smarter to spend your limited time on other more meaningful work.

Unprecedented audio quality.

There’s no way for conventional audio effects and plugins, to give you the type of audio quality you get with AI vocal removers.

Whether you’re a karaoke enthusiast, music producer, or researcher – you probably want the best audio quality, assuming you’re working on anything meaningful.

Remember, it’s expensive to be cheap.

Conclusion

I’m truly excited about how AI technologies are paving the way for music production. With AI-based vocal removers and stem makers now a reality, we can only imagine what the future holds as more applications and solutions are introduced.

Have you tried removing vocals or creating acapellas with an AI vocal remover? Have the growth of new technologies disrupted your work, the way it did for the once little karaoke song conversion service I once did?

Share your thoughts and experience in the comment section below!

Drop Your Comments Here

Audio Production