Vocaloid Wiki


A more human feel, using Layering

Voice Layering

Popular dance-music artists and rappers now use it frequently. If you listen well, in some new songs the singer is actually singing twice or trice, and all voices are layered together so that you can bearly notice it if you are not paying attention. Nonetheless, it gives depth and richness to the voice, allowing it to stand out better in the mix.

I'm writing this article to answer 2 questions I have been asked since I published a track called "Hard as Diamond", so I will use that track as an example. The questions were: 1. Which Vocaloid voice are you using?  2. How did you get that sound?

First off, the man voice in this track is mine, it's not Vocaloid; that is obvious. But for the woman, I used this layering technique that is more and more popular in the commercial world. I also find it important, for an english person to be able to understand the words of an enlgish song after the first or second listening, without having to read the lyrics on a video at the same time. The clarity of the voice is very important in any serious production. While some of you get around well in a profesionnal studio, most others readers are amateurs filled with passion to learn the tricks of the trade so they can get better results with their musical production. It is hence a pleasure to share a very detailed walkthrough of the vocal layering process. You might have to lookup a few of they key concepts used here, but you'll get the hang of it.

First, listen to the sample with a mature voice: Hard as Diamond (sample).

And to this one, with a teen voice: Snoring Interlude.

In this track I layered 7 different Vocaloid voices to creat the woman voice, and treated them to get the best of each, and eliminate as much robotic/computerized sound as I could. (I would be happy if anyone could get better results than I did, and contact me to explain their method for a most human-realist voice). So first, I made a single track in Vocaloid, using Avanna. All legatos and portamentos are turned off (or set to 0%). Every syllable is pitched by hand using the control layer, to get the legatos and variations I wanted, for a more realist pitch variation. A software can't know what feeling I want to convey. However, in this case I did not touch any other control layer, since we're going to use post-filtering for that. I would tweek brightness, breathe or dynamics only if really needed.

Once I had the melody and feel I wanted, I duplicated this track and had it play by every english female voice I had, at the same time. The only thing I might change is the "A" sound of some tracks, to have the vowel I want, and if a vowel falls between two samples existing phonemes, like @ and V but not quite, I would have some of the voices with @ and the other voices with V, so the sum will get closer to the realist vowel. Then, all voices are exported from Vocaloid, and imported in my composition/audio (DAW) software. So far, it's aweful and way too loud, noizy, poppy... that is where the filtering work starts.

In "Hard as Diamond", the Vocaloid voice in pure center is Avanna, supported by Megpoid English 10% left, and SweetAnn 10% right, also Prima a bit on the right but at lower volume so we don't hear the opeara influence. It's her lower mature frequencies we want in an electro track, without her opera feel. Then, Sonika and Luka further appart, to give the feminine breathing high frequencies, and Oliver (the boy) to fill in vowels smoothly. All voices are pan-distributed in a spread from left 60% to right 60%, not math-even, but so you have a balance of highs in far ends, and mature/rich closer to the center. The panning distribution is done with visualizers, but mostly to the ear. I also added a low-volume copy of the Avanna track pure left 100%, and another copy pure right 100%, both at very low volume, for a total of 9 audio tracks. (Teen voice or "Snoring Interlude" has 7 tracks, no duplicates, with Megpoid and Oliver in the center.)

Each voice is then individually passed in a limiter/compressor, to even out the volume difference. Then volumes are adjusted to get frequencies you want. The limit/compress step is essential to be able to treat all voices at once in the next step, while the volume is adjusted depending on which voice you want. A more teenage voice would have Megpoid in the center, supported by Sonika and Luka, while the Avanna, Sweet Ann and Prima would be further from the center and at lower volume.

All voices are then merged in a single stereo bus (group channel), so each other plug-in work will be applied to the entire group of voices. When all voices are playing together, the impact of consonants that start words, and the 'S's and T's make it even more ugly. On the group channel, I apply a transient effect to lower the attacks of consonants, and a dramatic de-esser to cut down the aggressing S's and T's high pitch frequencies (applied on average 4 to 11 khz). An EQ gives the richness to the voice, elevating the 2khz and 6khz+, but lowering slightly the 4khz (This has to be adapted to each new comp, nothing fixed in stone). Combined with an harmoic exciter, the EQ+Exciter gives it a tube-amp like feeling, or pop-rock microphone warmth. A dynamics adjustment (after EQ) will ensure volume without altering quality.

After that, a stereo-delay (super-tap in this case) is applied at low volume, full left/full right. Then the reverbe is added, as the final effect.

All these effectes are first inserted without tweeking, and the humanizing process starts. Play with each setting on the group channel plug-ins, in order they are included. Adjust again your separate voice volumes, to get the feel you want. Play each voice separately once all plug-ins are in, just to ear which influence they give to your final mix. Then play all voice together and keep adjusting. A final touch to the EQ and reverb will have to be tweeked, each time to change your individual volumes.

To get more samples of this layering technique, you can check my producer profile or visit my website., both with sample and full downloads (legal) of tracks where I used this layering technique. Sometimes, I wanted the voco-feel, with a more teenage sound. Sometimes, I wanted a voice as human as possible. You should try it out too, and adapt


In Vocaloid - Create/edit 1 track in Vocaloid, using the main voice tone you want.

- Copy this track and have it play also by every other female voice you have at the same time, and Oliver if you have it.

In your DAW

- In this case, 7 Vocaloid voices (9 tracks, read above), all limit/compressed for an equal volume.

- Voices panning distribution, main tone you want in the center, support tones in the edges (left/right)

- Volumes are adjusted to get tone we want, and put everything else as support.

- All voices are directed in a stereo bus (group channel).

On group channel

- Equalizer

- Harmonic Exciter

- Transient attack reduction (transient shaper)

- De-esser

- Dynamics

- Stereo delay

- Reverb

Software, Plug-ings and CPU load

You can use pretty much any composition (DAW) software. I use Cubase 6.5. There are free VST plugins on the web for moslty everything here, but I got better results using software from iZotope, Native-Instruments and Waves. I own a legal/paid/commercial copy of every software I use, including all Vocaloid voices.

Now, all these plugins could be demanding, so you'll have to limit yourself if your system can't afford the cpu power. For the curious, I use an Asus computer with Intel iCore7 990x 12 threads at 3.47GHz, and get up to 60% load on my cpu, when all other instruments are live. So I have no idea how taxing it can be on your system, but I advise you to keep your cpu monitor in check, possibly treating your Vocaloid voices in a project, and importing your output in your music project.

Also on Fandom

Random Wiki