Japanese VOCALOIDs are VOCALOIDs that are capable of mimicking the Japanese language much easier than VOCALOIDs of other languages. The followings are lists of phonemes needed to make the VOCALOID sing in Japanese.


Japanese vocals make up the largest selection of the VOCALOID vocals available for purchase. The language is fairly straightforward to produce as most sounds are more definite than with languages such as English. In addition, development occurs significantly faster due to the lesser amount of sounds required.

Japanese ScriptsEdit

Japanese VOCALOIDs can use a standard YAMAHA script.

The script has been adjusted considerably since the early VOCALOID days, with adjustments even being made during the production of the Hatsune Miku vocal in VOCALOID2 to improve the needed sounds for Japanese.[1] Thus, there was a slight leap between VOCALOID and VOCALOID2 quality even without the engine itself taken into account. With the addition of triphones in VOCALOID3, Japanese Vocaloids also became much smoother then VOCALOID2 vocals.

Japanese VocaloidsEdit

The following are a list of Vocaloids that use Japanese.

Phonetic System's CharacteristicsEdit

There are 41 phonetic pronunciations which make up the Japanese Vocaloid library, these phonetic inputs will use any set of the estimated 500 total samples needed for Japanese recreation per pitch.[2]

Due its moraic nature, the Japanese language has a simple phonotactics and syllable structure. For this reason the Japanese Phonetic system was designed to be encoded as [C V] syllables. For that reason, the voicebanks may struggle in pronouncing consonant clusters, diphthongs or consonants in coda position.


The Japanese Phonetic System includes the 5 vowels of the Japanese Language.

As per the palatalization phenomena found in the Japanese Language, the system is designed so that the vowel [i] needs to have a palatalized consonant in front of it to produce sound. If this isn't the case then the combination will be silent, even if both phonemes are separated in different notes. The only exception to this are the phonemes [s] and [dz], as those ones produce sound when followed by an [i].

It's important to note that some voicebanks may have problems with certain vowel combinations, which can end up sounding choppy. Some techniques to help correct this exist. Generally, this was a more common problem in the first generation of the software but as the release of the Japanese voicebanks progressed, the vowel combination problem become much less apparent. This is due to the improvement of the recording and processing methods aswell as overall experience with the synthesis engine within the companies. The problem was phased out completely by the third generation.


The Japanese Phonetic System includes 36 consonant phonetic pronunciations. Due to Japanese being a language which has little to no consonants clusters, the system was designed without consideration to standalone consonants. Because of this, consonants always need to be accompanied by a vowel. If not, the synthesizer won't be capable of reproducing the consonant. This will instead generate audio distortion, clicks, electronic buzzing or sound loops.

The exception to this are the nasal consonants associated to the Japanese N or ん, which is the only consonant in the Japanese Language which is pronounced without a vowel, as this character is considered a mora in their own.

Palatalized ConsonantsEdit

In the case of the consonants, due the Yōon and the related palatalization phenomena of the Japanese Language, the system includes two versions of the same phoneme: the standard one and the respective palatalized version.

The palatalization has two definitions, a phonetic one and a phonological one. For the phonetic term, it refers to a secondary articulation, which adds a small y-like glide sound at the end of the consonant. The phonological term refers to a kind of sound mutation or assimilation process, that changes the sound of a consonant into a more palatal articulation. In the case of the Japanese language, both kinds of phenomena can be found.

For the Japanese Phonetical System, most of the palatalized consonants are differentiated from their standard version with the addition of a small apostrophe ('), which is the X-SAMPA's equivalent to the IPA's small superscript ‹ʲ›, used to denote the secondary palatal articulation. Example: [tʲ] for a palatalized [t].

Nasal ConsonantsEdit

In the Japanese language, one of the few consonants that are pronounced is the N (ん in hiragana, ン in katakana). This letter has a lot of assimilation allophones, and all those are nasal consonants. Due this, all the nasal phonemes ([n], [J], [m], [m'], [N], [N'], [N\]) can be reproduced standalone, without a vowel accompanying them.

Forbidden CombinationsEdit

Due to the way the Japanese voicebanks were recorded and the way the Vocaloid editor was made, there are some phoneme combinations that are forbidden or aren't recognized by the synthesizer. If you attempt to enter these combinations they won't produce sound due to the synthesizer not allowing them.

Some of there forbidden combinations are:

  • non-palatalized phoneme + [i] (Exceptions: [s], [dz])
  • [w M], [j i] and [h M]: nonexistent in the Japanese Language. The [h M] combination is replaced by [p\ M]
  • Some palatalized phonemes + vowel different to [i] (check the previous chart)

Also, there are some consonant phonemes that are restricted to certain vowels. If the combination isn't the correct one, the synthesizer won't produce sound.

  • [h\]: Restricted to the vowels [e], [o]
  • [z] and [Z]: Restricted to the vowels [e], [o], and [M]

Voiceless PhonemesEdit

A new set of phonemes was added with the release of the Vocaloid3 software. This new set of phonemes are unvoiced versions of the vowels and the Sonorants consonants (Liquids and Nasal Consonants, including they palatalized versions) found in the Japanese Phonetic System.[3]

In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Sometimes the sonorants (vowels and sonorant consonants) can became pronounced in a voiceless manner. When this occurs, you can actually see the person articulate the sonorant, but it's either barely audible or silent altogether.

Example: the Japanese word sukiyaki is pronounced [su̥]. This may sound like [] to an English speaker, but the lips can be seen compressing for the [u̥]. Something similar happens in English with words like peculiar [pʰə̥ˈkjuːliɚ] and potato [pʰə̥ˈteɪtoʊ].

To use them, the user must add the suffix [◌_0] to the sonorant, which corresponds to the X-SAMPA's diacritic for <◌̥>, the IPA's diacritic for a voiceless phonation.

Example: For a voiceless [o] the user must type [o_0]. For a voiceless [4] the user must type [4_0]

When a Vocaloid2's Japanese voicebank is imported to Vocaloid3 this new set of phonemes is generated from the samples existing on it.

Sonorant Type Default Devoiced

[a]; [e]; [i]; [o]; [M]

[a_0]; [e_0]; [i_0]; [o_0]; [M_0]


[n]; [J]; [m]; [m']; [N]: [N']; [N\]

[n_0]; [J_0]; [m_0]; [m'_0]; [N_0]: [N'_0]; [N\_0]
Liquids [4]; [4'] [4_0]; [4'_0]


Fixing choppy vowel combinationsEdit


Is possible to correct the problem of certain vowels combinations that sounds chopped with the aid of the phonemes [j]. [w] and [h\].

The consonant phonemes [j] and [w] can be utilized as semivowels or glides for the vowels [i] and [M] respectively, which allows use them to fix the vowel combinations with those vowels.
These consonants can be utilized either in replacement of their vowel:

  • The first Japanese Vocaloids (Meiko and Kaito) have some problems pronuncing [a i]. This can be fixed replacing the [i] for a [j]. [a i] → [a j]

or can used to unite the both vowels inserting it between them (don't forget the combinations [j i] and [w u] are forbidden).

  • If the combination [M e] sounds choppy, the note can be split in two . [M e] → [M w][e] or [M][w e] (probably you will need decrease the accent or attack to got a smooth pronunciation)

Blending PhonemesEdit

  • Choppy Vowel Combination
  • Vowel Combination fixed with the phoneme [h\]
  • Waveform comparison between both samples
Choppy vowel example

Comparison using the [h\] phoneme

Miku sings the Vowel combination [e a]. First without fix and then fixed with the help of the phoneme [h\]. Compare the second case one has a smoother pronunciation in comparison with the first one.

In the case were you can't use these phonemes, you always can use the restricted phoneme [h/]. This phoneme just produces sound if is succeeded by a [e] or [o], when combined with the other vowels this consonant won't produce any sound. However, if after the mute combination you add a vowel on a different note, the synthesizer will skip the mute combination and immediately will reproduce the following vowel, allowing you fix choppy vowel combinations.

  • Miku is known for struggle with the [e] and [o] vowel combinations. When . [o a] → [o h\ a][a] or [o][o h\ a][a]
  • The Kagamine Rin / Len ACT2 are known to have various choppy vowel combinations. Due their [h\] is mute with any vowel, this one allows fix any choppy vowel combination.

In Vocaloid2, the phoneme [Asp] generates a similar effect to the phoneme [h\] with any vowel combination, allowing use it with choppy vowel combinations.

Gemination and Consonant LengthEdit

The gemination (consonant lenght) is when a spoken consonant is pronounced for an audibly longer period of time than a comparative short consonant. This is an important distinctive phonetic process in the Japanese Language.

Example: Two words can have a different meaning just for the different consonant's length
河川 kasen IPA:[ka.sẽɴ] 'Rivers'
合戦 kassen IPA:[kas.sẽɴ] or [kaẽɴ] 'Battle'

Exist different techniques for the different versions of the software.

For Vocaloid2Edit

As was mentioned before, the Japanese Phonetic system wasn't designed to allow the consonant be reproduced alone, if the user tries to encode it without a vowel this will generate an almost unaudible loop sounding as an electronic buzz. However if the consonant is in middle of two reproducible notes or syllables, the system is capable of hand it better, making possible encode it alone. This permits to use it to extend the some consonant.

For increase the length of a consonant the user must create a gap between the the preceding syllable and the next one containing the consonant to extend. Then fill the gap with a short note containing the consonant phoneme to extend, without a vowel.[4]


It's important that the note preceding the consonant alone must end it vowel, if isn't the case the synthethizer won't be capable of hand it, producing an undesired chop. Also it's important emphatize that although this method allows extend the consonants, the system stills struggles with the consonants encoded alone, specially if these ones are too long. This can generate sound loops or distortion of the phoneme, so it's important not abuse of the method.

For Vocaloid3Edit

For the third version of the software, the parameter Velocity (VEL), was corrected, now effectively affecting the lenght of the consonants when this one is modified. This, added to the addition of the devoiced phonemes allows effectively modify the lenght of consonants without utilize complicated techniques or post-edition steps as ocurred with Vocaloid2.

Phonetics ListEdit

Symbol Classification IPA Symbol Sample Hiragana/ Kunrei-shiki Romaji Notes Related Phonemes
[a] vowel ä open central unrounded vowel あ a
[i] vowel i close front unrounded vowel い i

[j] (glide)

[M] vowel ɯᵝ or ɯ͡β close back compressed vowel う u The japanese "u" is neither rounded [u] nor unrounded [ɯ], but compressed.

[w] (glide)

[e] vowel mid front unrounded vowel え e
[o] vowel mid back rounded vowel お o, を
[k] consonant k voiceless velar plosive ka, く ku, け ke, こ ko

[g] (voiced)

[k'] (palatalized)

[k'] palatalized consonant palatalized voiceless velar plosive ki, きゃ kya, きゅ kyu, きぇ kye, きょ kyo Palatalized /k/.

[g'] (voiced)

[k] (depalatalized)

[g] consonant g voiced velar plosive ga, ぐ gu, げ ge, ご go

[k] (voiced)

[g'] (palatalized)

[N] (nasal)

[g'] palatalized consonant gi , ぎゃ gya, ぎゅ gyu, ぎぇ gye, ぎょ gyo Palatalized /g/.

[k'] (voiced)

[g] (non-palatalized)

[N'] (nasal)

[N] consonant ŋ velar nasal ga, ぐ gu, げ ge, ご go, ん n-n' Nasalized /g/. Also is an allophone of the /n/ before an velar consonant.

[N'] (palatalized)

[g] (plosive)

[n] (develarized)

[N'] palatalized consonant ŋʲ き゜gi , き゜ゃ gya, き゜ゅ gyu, き゜ぇ gye, き゜ょ gyo, ん n-n' Palatalized nasal /g/.

[N] (depalatalized)

[g'] (plosive)

[n'] (develarized)

[s] consonant s voiceless alveolar sibilant sa, す su, せ se, そ so, すぃ si

[z] (voiced)

[S] (palatalized)

[ts] (affricated)

[S] palatal consonant ɕ or ʃʲ voiceless alveolo-palatal sibilant shi, しゃ sha, しゅ shu, しぇ she, しょ sho Palatalized /s/. The X-SAMPA symbol incorrectly suggest it's a /ʃ/, although both phonemes sound similar they aren't the same one.

[Z] (voiced)

[tS] (affricated)

[s] (depalatalized)

[z] consonant z voiced alveolar sibilant zu, ぜ ze, ぞ zo Often used between vowels, however not all the Japanese speakers use this sound.

[s] (voiceless)

[Z] (palatalized)

[dz] (affricated)

[Z] palatal consonant ʑ or ʒʲ voiced alveolo palatal sibilant じゅ ju, じぇ je, じょ jo, じゃ ja, じ ji Palatalized /z/, often used between vowels, however not all the Japanese speakers use this sound. The X-SAMPA symbol incorrectly suggest it's a /ʒ/, although both phonemes sound similar they aren't the same one.

[S] (voiceless)

[z] (depalatalized) [dZ] (affricated)

[dz] consonant ʣ voiced alveolar affricate za, ず zu, づ zu, ぜ ze, ぞ zo, じゃja, じ ji, じゅ ju, じぇ je, じょ jo Often used at the beginning of word or after んn, however some Japanese speakers also use this sound instead of z or Z.

[ts] (voiceless)

[dZ] (palatalized)

[z] (spirantizated)

[d] (deaffricated)

[dZ] palatal consonant ʥ voiced alveolo-palatal affricative ji, ぢ ji, じゃja, じゅ ju, ぢぇ je, じょ jo Palatalized /dz/ or /d/, some Japanese speakers use this sound instead of z or Z. The X-SAMPA symbol incorrectly suggest it's a /ʤ/, although both phonemes sound similar they aren't the same one.

[tS] (voiceless)

[dz] (depalatalized)

[Z] (spirantizated)

[d] (deaffricated)

[t] consonant t voiceless alveolar plosive ta, て te, と to, とぅ tu

[t'] (palatalized)

[tS] (affricated)

[t'] palatalized consonant てぃ ti, てゅ tyu Palatalized /t/, usually used into non-Japanese words incorporated to the language.

[d'] (voiced)

[t] (depalatalized)


[ts] consonant ʦ voiceless alveolar affricate tsu, つぁ tsa, つぃ tsi, つぇ tse, つぉ tso

[dz] (voiced)

[t] (deaffricated)

[s] (spirantizated)

[tS] palatal consonant ʨ voiceless alveolo palatal affricate chi, ちゃ cha, ちゅ chu, ちぇ che, ちょ cho Palatalized /t/. The X-SAMPA symbol incorrectly suggest it's a /ʧ/, although both phonemes sound similar they aren't the same

[dZ] (voiced)

[ts] (palatalized)

[t] (deafrricated)

[S] (spirantizated)

[d] consonant d voiced alveolar plosive da, どぅ du, で de, ど do

[t] (voiceless)

[d'] (palatalized)

[dz] (affricated)

[d'] consonant でぃ di, でゅ dyu Palatalized /d/, usually used into non-Japanese words incorporated to the language.

[t'] (voiceless)

[d] (depalatalized)

[n] consonant n alveolar nasal na, ぬ nu, ね ne, の no, ん n  This consonant can be articulated without a vowel.

[J] (palatalized)

[N] (velarized)

[m] (labialized)

[J] consonant ɲ or nʲ palatal nasal ni, にゃ nya, にゅ nyu, にぇ nye, にょ nyo   Palatalized n, this phoneme also appears as allophone of /n/ before palatal.

[n] (depalatalized)

[N'] (velarized)

[m'] (labialized)

[h] consonant h voiceless glottal fricative ha, へ he, ほ ho 

[C] (palatalized)

[p\] (labialized)

[h\] (voiced)

[h\] consonant ɦ voiced glottal fricative xa, ぃ xi, ぅ xu, ぇ xe, ぉ xo Intervowel /h/. Only works for [e] and [o].

[h] (voiceless)

[C] palatal consonant ç voiceless palatal fricative hi, ひゃ hya, ひゅ hyu, ひぇ hye, ひょ hyo In the Japanese is perceived as a palatalized h. [h] (depalatalized)
[p\] consonant ɸ voiceless bilabial fricative fu, ふ fwa, ふ fe, ふ fo

[h] (debuccalizated)

[p] (spirantizated)

[p\'] palatalized consonant ɸʲ ふぃ fi, ふゃ fya, ふゅ fyu, ふぇ fye, ふょ fyo, Palatalized /ɸ/.

[p\] (depalatalized)

[h] (delabialized)

[b] consonant b voiced bilabial plosive ba, ぶ bu, べ be, ぼ bo 

[p] (voiceless)

[b'] (palatalized)

[b'] palatalized consonant bi, びゃ bya, びゅ byu, びぇ bye, びょ byo  Palatalized /b/.

[p'] (voiceless)

[b] (depalatalized)

[p] consonant p voiceless bilabial plosive pa, ぷ pu, ぺ pe, ぽ po 

[b] (voiced)

[p'] (palatalized)

[p'] palatalized consonant pi, ぴゃ pya, ぴゅ pyu, ぴぇ pye, ぴょ pyo Palatalized /p/.

[b'] (voiced)

[p] (depalatalized)

[m] consonant m bilabial nasal ma, む mu, め me, も mo Also is allophone of /n/ in front labial consonants. This consonant can be articulated without a vowel.

[m'] (palatalized)

[n] (delabialized)

[m'] palatalized consonant mi, みゃ mya, みゅ myu, みぇ mye, みょ myo Palatalized /m/.

[m] (depalatalized)

[J] (delabialized)

[j] consonant j palatal approximant ya, ゆ yu, よ yo, いぇ ye

[i] (silibant)


[4] consonant ɾ alveolar flap ra, る ru, れ re, ろ ro Although the X-SAMPA suggest that this phoneme is a alveolar tap, technically is an apical postalveolar flap undefined for laterality, hence the Japanese /r/ tends to sound somewhat between a ɽ and a ɺ. If the consonant has a more R-like or L-like sound, depends of its context. [4'] (palatalized)
[4'] palatalized consonant ɾʲ ri, りゃ rya, りゅ ryu, りょ ryo [4] (depalatalized)
[w] consonant w͍ or wᵝ compressed labio-velar approximant wa, うぃ wi, うぇ we, うぉ wo Similar to its /u/, the Japanese /w/ is compressed.

[M] (syllabic)

[N\] consonant ɴ uvular nasal ん n /n/ at the of end of word. [n]

Additional notesEdit

  • The Japanese Phonetic System actually uses the symbol <¥> instead of <\>. However, for easier comparison with their X-SAMPA and to synchronize with most of the keyboard, typing <\> will be input as <¥> in the sythesizer, the wikia will prefer this notation among the articles.
  • Crypton’s Vocaloids, including Kaito and Meiko, have almost the same Japanese phonetic system.[5] To use [z], [Z], [h\], [N] and [N'] , users need to edit the phonemes, not entering kana-characters.
  • Rin/Len Kagamine Act 1 can pronounce [h\] while their Act 2 cannot (comparison of consonant sounds Act 1, Act 2).
  • Vocaloids of Internet Co. Ltd., such as Gackpoid or Megpoid, mostly share the same system as Crypton’s, but they do not have [z] and [Z] sounds. As is often the case with the Japanese language, they are replaced by [dz] and [dZ].[6][7]
  • Japanese VOCALOID2 voicebanks can combine a and i phonemes (eg. [w a i]) but not with the original VOCALOID voicebanks. The workaround is to simply use the y consonant. (eg. [w a j])
  • [N\], [N] or [n] alone tends to be pronounced as "ng". This is the basis for Japanese vocaloids being used for South-East Asian languages.
  • [N'] followed by a vowel different to [i] may produce odd results, however, due to its use within the Japanese language there is no actual call for this phonetic to be followed by a vowel different to [i].

Conversion Lists
Interwiki articles


  1. link
  2. [1]
  5. Japanese Phonetic System of VOCALOID KAITO
  6. Japanese Phonetic System of Megpoid
  7. Japanese Phonetic System of Gackpoid

