Japanese Vocaloids are Vocaloids that are capable of mimicking the Japanese language much easier then Vocaloids of other languages. The followings are lists of phonemes needed to make the Vocaloid sing in Japanese.
The following are a list of Vocaloids that use Japanese.
- Hatsune Miku
- Kagamine Rin
- Kagamine Len
- Kamui Gakupo
- Megurine Luka
- SF-A2 miki
- Kaai Yuki
- Kiyoteru Hiyama
- Nekomura Iroha
- Utatane Piko
- Tone Rion
- Yuzuki Yukari
- Aoki Lapis
Phonetic System's CharacteristicsEdit
There are 41 phonetic pronunciations which make up the Japanese Vocaloid library, these phonetic inputs will use any set of the estimated 500 diphonetic samples needed for Japanese recreation (Vocaloid uses approx 1,300 samples for Japanese overall).
The Japanese has a syllable structure (C)(S)(V) or Consonant-Semivowel-Vowel, where the semivowel is optional and in the most of the cases corresponds to a palatal approximant or [j]. For this reason the Japanese Phonetic system was designed for be encoded as [C V] syllables. This causes that some Vocaloids struggles in pronounce some consonant clusters or consonants in coda position (at the end of the syllable), requiring some tricks or editing.
The Japanese Phonetic System includes the 5 vowels of the Japanese Language.
Due the palatalization phenomena found in the Japanese Language, the system is designed in such way that the vowel [i] needs to have an palatalized consonant in front of it to produce sound. If this isn't the case then the combination won't be producing sound, even if the both phoneme are separated in different notes.
Is important to say that some Vocaloids has some problems with certain vowel combinations, which sounds choppy. Anyway there exist some techniques that can help to correct this. Generally this is a problem more common in the first voicebanks, and eventually was corrected in the last ones released.
The Japanese Phonetic System includes 36 consonant phonetic pronunciations. Due the Japanese is a language which has little or null consonants cluster the system was designed without considering to the consonant to be encoded standalone. By this, the consonants always needs to be accompanied by a vowel, if this isn't the case, the synthesizer won't be capable to reproduce the consonant, generating a audio distortion, clicks, electronic buzzing or sound loops.
In the case of the consonants, due the palatalization phenomena of the Japanese Language mentioned previously, the system includes two versions of the same same phoneme: the standard one and its palatalized version.
The palatalization can be defined as a phonetic term of the secondary articulation of consonants by which the body of the tongue is raised toward the hard palate and the alveolar ridge during the articulation of the consonant. Such consonants are phonetically palatalized, and in the International Phonetic Alphabet they are indicated by a small superscript ‹j›, as with [tʲ] for a palatalized [t].
Simplifying, when a consonant is palatalized its sound is a kinda distorted to the sound of the palatal approximant or /j/ (in the English corresponds to the letter y) at the end of it. In the Japanese the palatalization normally occurs in the case of the palatals /i/ and /j/, affecting their preceding consonants.
In the most of the cases the palatalized phoneme is differenced of its standard version for the addition of a small apostrophe ('), which is the X-SAMPA's equivalent to the IPA's small superscript ‹ʲ›. The exception to this are the phonemes [n], [h], [s], [z], [dz] (or [d]), [t] which their respective palatalized phonemes are [J], [C], [S], [Z], [dZ], [tS]. In this case it seems the phonemes are palatal consonants instead of simple palatalized consonants (the [t] and the [d] are an special case, having two palatalized phonemes, the [tS] & [t'] for the [t] and the [dZ] & [d'] for the [d]. The phonemes [d'] and [t'] generally are used for non-japanese words incorporated to the language).
For this reason, the system is designed in such way that the vowel [i] needs to be preceded by it respective palatalized phoneme. The only exception to this are the phonemes [s] and [dz], those ones produces sound when followed by an [i] (not all the Japanese speakers palatalize the phonemes /s/, /z/ and /dz/).
The palatalized phonemes usually have a small /j/ glide at the end of them, this glide is only produced when the palatalized phoneme is followed by a vowel. Also they tend to have a more marked pronunciation than their non-palatalized versions due the /j/ distortion.
The palatalized phonemes can be used with vowels besides the [i], although not all the combinations will produce sound.
In the Japanese language, one of the few consonants which is pronounced is the N (ん in hiragana, ン in katakana). This letter has a lot of assimilation allophones, and all those are nasal consonants. Due this, all the nasal phonemes ([n], [J], [m], [m'], [N], [N'], [N\]) can be reproduced standalone, without a vowel accompanying them.
Due to the way the Japanese voicebanks were recorded and the way the Vocaloid editor was made, there are some phoneme combinations that are forbidden or aren't recognized by the synthesizer. If you attempt to enter these combinations they won't produce sound due the synthesizer not allowing them.
Some of there forbidden combinations are:
- non-palatalized phoneme + [i] (Exceptions: [s], [dz])
- [w M], [j i] and [h M]: inexistent in the Japanese Language. The [h M] combination is replaced by [p\ M]
- Some palatalized phonemes + vowel different to [i] (check the previous chart)
Also there are some consonant phonemes that are restricted to certain vowels. If the combination isn't the correct one, the combination won't produce sound.
- [h\]: Restricted to the vowels [e], [o]
- [z] and [Z]: Restricted to the vowels [e], [o], and [M]
In linguistics, voicelessness is the property of sounds being pronounced without the larynx vibrating. Sometimes the sonorants (vowels and sonorant consonants) can became pronounced in a voiceless manner. When this occurs actually you can see the person articulate the sonorant, but you can't hear it (and if isn't the case, barely you can hear it).
- Example: the Japanese word sukiyaki is pronounced [su̥.ki.ja.ki]. This may sound like [s.ki.ja.ki] to an English speaker, but the lips can be seen compressing for the [u̥]. Something similar happens in English with words like peculiar [pʰə̥ˈkjuːliɚ] and potato [pʰə̥ˈteɪtoʊ].
For use them the user must add the suffix [◌_0] to the sonorant, which corresponds to the X-SAMPA's diacritic for <◌̥>, the IPA's diacritic for a voiceless phonation.
- Example: For a voiceless [o] the user must type [o_0]. For a voiceless  the user must type [4_0]
When a Vocaloid2's Japanese voicebank is imported to Vocaloid3 this new set of phonemes is generated from the samples existing on it.
[a]; [e]; [i]; [o]; [M]
[a_0]; [e_0]; [i_0]; [o_0]; [M_0]
[n]; [J]; [m]; [m']; [N]: [N']; [N\]
|[n_0]; [J_0]; [m_0]; [m'_0]; [N_0]: [N'_0]; [N\_0]|
|Liquids||; [4']||[4_0]; [4'_0]|
Fixing choppy vowel combinationsEdit
Is possible to correct the problem of certain vowels combinations that sounds chopped with the aid of the phonemes [j]. [w] and [h\].
The consonant phonemes [j] and [w] can be utilized as semivowels or glides for the vowels [i] and [M] respectively, which allows use them to fix the vowel combinations with those vowels.
These consonants can be utilized either in replacement of their vowel:
- The first Japanese Vocaloids (Meiko and Kaito) have some problems pronuncing [a i]. This can be fixed replacing the [i] for a [j]. [a i] → [a j]
or can used to unite the both vowels inserting it between them (don't forget the combinations [j i] and [w u] are forbidden).
- If the combination [M e] sounds choppy, the note can be split in two . [M e] → [M w][e] or [M][w e] (probably you will need decrease the accent or attack to got a smooth pronunciation)
In the case were you can't use these phonemes, you always can use the restricted phoneme [h/]. This phoneme just produces sound if is succeeded by a [e] or [o], when combined with the other vowels this consonant won't produce any sound. However, if after the mute combination you add a vowel on a different note, the synthesizer will skip the mute combination and immediately will reproduce the following vowel, allowing you fix choppy vowel combinations.
- Miku is known for struggle with the [e] and [o] vowel combinations. When . [o a] → [o h\ a][a] or [o][o h\ a][a]
- The Kagamine Rin / Len ACT2 are known to have various choppy vowel combinations. Due their [h\] is mute with any vowel, this one allows fix any choppy vowel combination.
In Vocaloid2, the phoneme [Asp] generates a similar effect to the phoneme [h\] with any vowel combination, allowing use it with choppy vowel combinations.
Gemination and Consonant LengthEdit
The gemination (consonant lenght) is when a spoken consonant is pronounced for an audibly longer period of time than a comparative short consonant. This is an important distinctive phonetic process in the Japanese Language.
- Example: Two words can have a different meaning just for the different consonant's length
- 河川 kasen IPA:[ka.sẽɴ] 'Rivers'
- 合戦 kassen IPA:[kas.sẽɴ] or [kasːẽɴ] 'Battle'
Exist different techniques for the different versions of the software.
As was mentioned before, the Japanese Phonetic system wasn't designed to allow the consonant be reproduced alone, if the user tries to encode it without a vowel this will generate an almost unaudible loop sounding as an electronic buzz. However if the consonant is in middle of two reproducible notes or syllables, the system is capable of hand it better, making possible encode it alone. This permits to use it to extend the some consonant.
For increase the lenght of a consonant the user must create a gap between the the preceding syllable and the next one containing the consonant to extend. Then fill the gap with a short note containing the consonant phoneme to extend, without a vowel.
It's important that the note preceding the consonant alone must end it vowel, if isn't the case the synthethizer won't be capable of hand it, producing an undesired chop. Also it's important emphatize that although this method allows extend the consonants, the system stills struggles with the consonants encoded alone, specially if these ones are too long. This can generate sound loops or distortion of the phoneme, so it's important not abuse of the method.
For the third version of the software, the parameter Velocity (VEL), was corrected, now effectively affecting the lenght of the consonants when this one is modified. This, added to the addition of the devoiced phonemes allows effectively modify the lenght of consonants without utilize complicated techniques or post-edition steps as ocurred with Vocaloid2.
|Symbol||Classification||IPA Symbol||Sample Hiragana/ Kunrei-shiki Romaji||Notes||Related Phonemes|
|[a]||vowel||ä open central unrounded vowel||あ a||[a_0] (devoiced)|
|[i]||vowel||i close front unrounded vowel||い i||
|[M]||vowel||ɯᵝ close back compressed vowel||う u||The japanese "u" is neither rounded [u] nor unrounded [ɯ], but compressed.||
|[e]||vowel||e̞ mid front unrounded vowel||え e||[e_0] (devoiced)|
|[o]||vowel||o̞ mid back rounded vowel||お o, を||[o_0] (devoiced)|
|[a_0]||devoiced vowel||ḁ̈ devoiced open central unrounded vowel||あ a||Only available for Vocaloid3.||[a] (voiced)|
|[i_0]||devoiced vowel||i̥ devoiced close front unrounded vowel||い i||Only available for Vocaloid3||[i] (voiced)|
|[M_0]||devoiced vowel||u͍̥ devoiced close back compressed vowel||う u||The japanese "u" is neither rounded [u] nor unrounded [ɯ], but compressed. Only available for Vocaloid3.||
|[e_0]||devoiced vowel||e̞̥ devoiced mid front unrounded vowel||え e||Only available for Vocaloid3.||[e] (voiced)|
|[o_0]||devoiced vowel||o̞̥ devoiced mid back rounded vowel||お o, を||Only available for Vocaloid3.||[o] (voiced)|
|[k]||consonant||k voiceless velar plosive||か ka, く ku, け ke, こ ko||
|[k']||palatalized consonant||kʲ palatalized voiceless velar plosive||き ki, きゃ kya, きゅ kyu, きぇ kye, きょ kyo||Palatalized /k/.||
|[g]||consonant||g voiced velar plosive||が ga, ぐ gu, げ ge, ご go||
|[g']||palatalized consonant||gʲ||ぎ gi , ぎゃ gya, ぎゅ gyu, ぎぇ gye, ぎょ gyo||Palatalized /g/.||
|[N]||consonant||ŋ velar nasal||が ga, ぐ gu, げ ge, ご go, ん n-n'||Nasalized /g/. Also is an allophone of the /n/ before an velar consonant.||
|[N']||palatalized consonant||ŋʲ||き゜gi , き゜ゃ gya, き゜ゅ gyu, き゜ぇ gye, き゜ょ gyo, ん n-n'||Palatalized nasal /g/.||
|[s]||consonant||s voiceless alveolar sibilant||さ sa, す su, せ se, そ so, すぃ si||
|[S]||palatal consonant||ɕ or ʃʲ voiceless alveolo-palatal sibilant||し shi, しゃ sha, しゅ shu, しぇ she, しょ sho||Palatalized /s/. The X-SAMPA symbol incorrectly suggest it's a /ʃ/, although both phonemes sound similar they aren't the same one.||
|[z]||consonant||z voiced alveolar sibilant||ず zu, ぜ ze, ぞ zo||Often used between vowels, however not all the Japanese speakers use this sound.||
|[Z]||palatal consonant||ʑ or ʒʲ voiced alveolo palatal sibilant||じゅ ju, じぇ je, じょ jo, じゃ ja, じ ji||Palatalized /z/, often used between vowels, however not all the Japanese speakers use this sound. The X-SAMPA symbol incorrectly suggest it's a /ʒ/, although both phonemes sound similar they aren't the same one.||
|[dz]||consonant||ʣ voiced alveolar affricate||ざ za, ず zu, づ zu, ぜ ze, ぞ zo, じゃja, じ ji, じゅ ju, じぇ je, じょ jo||Often used at the beginning of word or after んn, however some Japanese speakers also use this sound instead of z or Z.||
|[dZ]||palatal consonant||ʥ voiced alveolo-palatal affricative||じ ji, ぢ ji, じゃja, じゅ ju, ぢぇ je, じょ jo||Palatalized /dz/ or /d/, some Japanese speakers use this sound instead of z or Z. The X-SAMPA symbol incorrectly suggest it's a /ʤ/, although both phonemes sound similar they aren't the same one.||
|[t]||consonant||t voiceless alveolar plosive||た ta, て te, と to, とぅ tu||
|[t']||palatalized consonant||tʲ||てぃ ti, てゅ tyu||Palatalized /t/, usually used into non-Japanese words incorporated to the language.||
|[ts]||consonant||ʦ voiceless alveolar affricate||つ tsu, つぁ tsa, つぃ tsi, つぇ tse, つぉ tso||
|[tS]||palatal consonant||ʨ voiceless alveolo palatal affricate||ち chi, ちゃ cha, ちゅ chu, ちぇ che, ちょ cho||Palatalized /t/. The X-SAMPA symbol incorrectly suggest it's a /ʧ/, although both phonemes sound similar they aren't the same||
|[d]||consonant||d voiced alveolar plosive||だ da, どぅ du, で de, ど do||
|[d']||consonant||dʲ||でぃ di, でゅ dyu||Palatalized /d/, usually used into non-Japanese words incorporated to the language.||
|[n]||consonant||n alveolar nasal||な na, ぬ nu, ね ne, の no, ん n||This consonant can be articulated without a vowel.||
|[J]||consonant||ɲ or nʲ palatal nasal||に ni, にゃ nya, にゅ nyu, にぇ nye, にょ nyo||Palatalized n, this phoneme also appears as allophone of /n/ before palatal.||
|[h]||consonant||h voiceless glottal fricative||は ha, へ he, ほ ho||
|[h\]||consonant||ɦ voiced glottal fricative||ぁ xa, ぃ xi, ぅ xu, ぇ xe, ぉ xo||Intervowel /h/. Only works for [e] and [o].||
|[C]||palatal consonant||ç voiceless palatal fricative||ひ hi, ひゃ hya, ひゅ hyu, ひぇ hye, ひょ hyo||In the Japanese is perceived as a palatalized h.||[h] (depalatalized)|
|[p\]||consonant||ɸ voiceless bilabial fricative||ふ fu, ふ fwa, ふ fe, ふ fo||
|[p\']||palatalized consonant||ɸʲ||ふぃ fi, ふゃ fya, ふゅ fyu, ふぇ fye, ふょ fyo,||Palatalized /ɸ/.||
|[b]||consonant||b voiced bilabial plosive||ば ba, ぶ bu, べ be, ぼ bo||
|[b']||palatalized consonant||bʲ||び bi, びゃ bya, びゅ byu, びぇ bye, びょ byo||Palatalized /b/.||
|[p]||consonant||p voiceless bilabial plosive||ぱ pa, ぷ pu, ぺ pe, ぽ po||
|[p']||palatalized consonant||pʲ||ぴ pi, ぴゃ pya, ぴゅ pyu, ぴぇ pye, ぴょ pyo||Palatalized /p/.||
|[m]||consonant||m bilabial nasal||ま ma, む mu, め me, も mo||Also is allophone of /n/ in front labial consonants. This consonant can be articulated without a vowel.||
|[m']||palatalized consonant||mʲ||み mi, みゃ mya, みゅ myu, みぇ mye, みょ myo||Palatalized /m/.||
|[j]||consonant||j palatal approximant||や ya, ゆ yu, よ yo, いぇ ye||
|||consonant||ɽ retroflex flap||ら ra, る ru, れ re, ろ ro||Although the X-SAMPA suggest that this phoneme is a alveolar tap, actually the Japanese /r/ often varies between a ɽ and a ɺ.||[4'] (palatalized)|
|[4']||palatalized consonant||ɾʲ||り ri, りゃ rya, りゅ ryu, りょ ryo|| (depalatalized)|
|[w]||consonant||w͍ or wᵝ compressed labio-velar approximant||わ wa, うぃ wi, うぇ we, うぉ wo||Similar to its /u/, the Japanese /w/ is compressed.||
|[N\]||consonant||ɴ uvular nasal||ん n||/n/ at the of end of word.||[n]|
- The Japanese Phonetic System actually uses the symbol <¥> instead of <\>. However, for make easier the comparison with their X-SAMPA and due in the most of the keyboard, typing <\> will be input as <¥> in synthesizer, the wikia will prefer this notation among the articles.
- Crypton’s Vocaloids, including Kaito and Meiko, have almost the same Japanese phonetic system.  To use [z], [Z], [h\], [N] and [N'] , users need to edit the phonemes, not entering kana-characters.
- Rin/Len Kagamine Act 1 can pronounce [h\] while their Act 2 cannot (comparison of consonant sounds Act 1, Act 2).
- Vocaloids of Internet Co. Ltd., such as Gackpoid or Megpoid, mostly share the same system as Crypton’s, but they do not have [z] and [Z] sounds. As is often the case with the Japanese language, they are replaced by [dz] and [dZ]. 
- Japanese VOCALOID2 voicebanks can combine a and i phonemes (eg. [w a i]) but not with the original VOCALOID voicebanks. The workaround is to simply use the y consonant. (eg. [w a j])
- [N\], [N] or [n] alone tends to be pronounced as "ng". This is the basis for Japanese vocaloids being used for South-East Asian languages.
- [N'] followed by a vowel different to [i] may produce odd results, however, due to its use within the Japanese language there is no actual call for this phonetic to be followed by a vowel different to [i].
- Conversion Lists
- Interwiki articles
- ↑ http://a0010.web.fc2.com/text/v3memo/index.html
- ↑ http://ahou2chome.sakuratan.com/misc/mikuvoice/lesson02.html
- ↑ Japanese Phonetic System of VOCALOID KAITO
- ↑ Japanese Phonetic System of Megpoid
- ↑ Japanese Phonetic System of Gackpoid
Please note we are waiting for more information on some languages