Wikia

Vocaloid Wiki

Phoneme List

Comments12
4,374pages on
this wiki

The phonetic system forms the basis of speech play back in the Vocaloid software. Symbols used in the phoneme system are based on X-SAMPA.  

Using the Phonetic SystemEdit

Note: The following applies to the Vocaloid 2 system onwards, while both programs work in a similar fashion, some things may not apply to Vocaloid or working differently to Vocaloid 2

The Recording ProcessEdit

The samples are gathered via the provider reading out a script in various keys while being recorded. The recording is then transferred to into a library which the Vocaloids will pull their results from. The libraries consist of various sounds recorded and separated for use with the software.

For Japanese the script is much simpler with each phonetic sample successsfully divided across the notes with little trouble. This renders each note fairly precisely.

However, for English Vocaloids, the phonetic data has to be separated by cutting sections out of the recorded samples, because some sounds simply cannot be gathered unless they were spoken as part of a word. This makes separating sounds for the English Vocaloids much harder to do. As such, Japanese Vocaloids are often more precise than English ones on their diaphonetic sounds.

Constructing WordsEdit

Vocaloid uses the method called Frequency-domain Singing Articulation Splicing and Shaping, a kind of concatenative synthesis. This one takes a series of diphong and triphonetic samples from a sample library which are specified by the phonetic system and utilizes them to reconstruct the word reassembling them in accordance to how a word would be phonetically pronounced. For example, the word "sing" (IPA: sɪŋ, written as [s I N] in the Vocaloid Phonetic System) can be synthesized by concatenating the sequence of diaphones "#-s, s~ɪ, ɪ~ŋ, ŋ-#".[1] Using the phonetic system you can input the phonemes that conforms the word, allowing to the synthesizer, pick the correct sequence of diaphones to reconstruct the word. As the vowel [ɪ] (Vocaloid:[I]) sounds different for the diaphones s~ɪ (s-I) and ɪ~ŋ, the software needs apply a "smoothing processing" in frequency domain, which blends both diaphonic samples in a coherent syllable fragment (if this weren't the case, the results would be unnatural and glitchy).[2]

This way of reconstruction of the words is the same for all the languages in which Vocaloid is available and will use the same method of arrangement for the phonetic library. The fundamental difference between them is the number of samples required for reconstruct each language, being determined by its complexity. For example, with English, a language with numerous consonant clusters, as well as numerous vowels which includes diphthongs and a complex syllable structure, requires more diphonic and triphonetic samples than the Japanese, which has a simple syllable structure with practically no consonant clusters and a 5-vowel system.

Dictionary

The user word registration interface

The Vocaloid's dictionary will attempt to match the correct phonemes with the word the user enters, avoiding to have to input them manually. If a user allows the program to auto-find phonemes and it has a particular word that it simply cannot identify or not is registered in the dictionary, it will automatically write it as a default phoneme ([u:] (or English or [a] for Japanese). In that case the user will need input the phonemes manually or add the word to the dictionary, requiring in both cases to known how the word is written phonetically.

If a user knows how words are articulated, the person can infer how to write a word that isn't in the dictionary (Ex: knowing that "bung" is represented as [bh V N] and "bangle" is written as [bh { N g V l] you can infer that "bungle" has to be written as [bh V N g V l]).

In addition, the user cannot utilize the phonemes that don't exist in the current voicebank. If the user tries to enter a phoneme manually that the Vocaloid does not have in its voicebank, there will be no sound at all when the Vocaloid is played back.

Due the way the sound is articulated by the synthesizer, phonetic phenomena like the coarticulation and assimilation, where the phoneme sounds are affected by the adjacent phonemes, are present on the synthesized words. For that reason, the phoneme sounds do not always produce the same results; they may sound differently or weakly/strongly according to their previous/following phoneme sound.[3] To make a consonant sound stronger than the following vowel, editing Brightness, the constant sound's Breathiness or Dynamics higher will often work on some level.[4][5] Another alternative is to switch the phonemes (the affected one or the adjacent to it) with an allophone, approximant or just a similar sounding phoneme.

Editing the phonemesEdit

Phoneme editor

The "Note Properties" allows you to manually pick the suitable phonetic for a word

To create and edit phonemes, a user must right click on a note click and press "Note Properties". Here they can edit a phoneme and add additional effects through the "Note Expression Property" and the "Vibrato Property" windows. As shortcut, the user can double click a note to edit its lyric, then pressing the Alt key and down arrow key (Alt. + Down Arrow) at the same time allows the user to  edit the phonetic data directly. This also allows the user to use the Tab key to skip to the next note and skip back to the previous one using Shift and Tab keys. In Vocaloid3 since the v3.030 to forward is possible swap the phoneme imput with the lyric imput allowing you edit it directly with a simple double click.[6]

Because some phonemes are written with more than one character, such as the phonemes [u:] (for English) or [ts] (for Japanese), those need to be written separated with a space between them. If the user does not take care of this, the synthesizer will interpret all the characters as just one symbol, being unrecognized and producing no sound. Also, capitalization affects phonemes because some symbols are differentiated just by this (example: [Z] and [z] are different phonemes, so they don't produce the same result).

Additional notesEdit

Due to to the software's musical nature, monophonetic and polyphonetics may also be needed to be considered where needed for closer vocal pitching pronunciation.[7] The user will, however, have access only to the pronunciation at a phonetic levels and the finer levels of vocal speech adjustments that cannot be accessed currently.

Please note that all the Vocaloids simply do not have the same phonemes, such as the breathing phonemes [br1]- [br5].[8][9] There are also some phonemes that are found only in one language, so not all of the Japanese and English Vocaloids will share the same phonemes.[10] Also, while a Vocaloid's help guide will list the alphabet of the language, they may not include additional notes.

Using One Language To Create AnotherEdit

A user can use the phoneme system to create languages from scratch, so long as it is within the Vocaloid's capabilities. Due to the differences between both phonetic systems and between the individual voicebanks, there are some details the user must take into consideration when they attempt to make a Vocaloid sing a language they aren't intended for.

Regardless of this, if the user is aware of the Phonology of both idioms, the original one for that voice, and the target language, the task can be easier. Even more, a user may be creative, even going so far as to invent languages of their own if they desire.[11] Essentially, the more time a user spends working to get familiar with the phoneme system, the more they can get out of the Vocaloid program.

However, some voicebanks are easier to work with than others, presenting advantages others may not have. A clear example is Sonika, which is regarded as one of the most potential Vocaloids to "sing in any language" due to her unique set up, or Luka that allows to switch between her English and Japanese voicebanks according to the needs of the user. However, results are greatly influenced by both the user's technique and how much a Vocaloid's Phonetic System has phonologically in common with that of the target language without aids of other music/audio software. An example is that due the phonetic similarities, the Japanese Vocaloid can achieve a good level of Spanish. In the introduction of SeeU it was confirmed that the Korean language is capable of mimicking a decent amount of English due to phonetic similairites between the two.

Differences and ConsiderationsEdit

It's very important for the user to take note of the properties they may or may not be looking for in a voicebank. Certain advantages or disadvantages can make or break the song they're working to create, as well as details regarding the available phonemes or the voice clarity of a particular Vocaloid.


  • The Vocaloid or voicebank utilized: Each Vocaloid has their own characteristics, advantages and flaws, requiring their own tricks and considerations while working with them. Among the considerations the user must be aware of is the how the Vocaloid pronounces phonemes; some voicebanks have a more marked pronunciation of the consonants, or sometimes they pronounce the consonant clusters in a different way than others, and may make it difficult to achieve a closer pronunciation to the intended language.
  • The tempo utilized in the song: Important when short notes for some tricks or techniques are used. The tempo can affect them, requiring readjustment of the length or duration of those notes.
  • The pitch range of the current song: The voicebanks are recorded utilizing at least two registries: one for the higher pitches and one for the lower ones. The software then creates the transition between generating the scale of notes. Depending on how they're recorded, the pronunciation or quality of some phonemes will vary from voicebank to voicebank when the pitch is changed.
  • The influence of the adjacent phonemes, assimilation and coarticulation phenomena: The assimilation and coarticulation is present in the synthesizer, so a phoneme can affect its neighbors.

Due to the individual differences between the voicebanks, taking a different approach may obtain more desireable results. Phonemes that are not equivalent may work better than equivalent ones in the target languages; for example, when Miriam sings in Japanese, [v V] /vʌ/ sound closer to the actual pronunciation of [w a] /wa/ as a Japanese particle は than [w V] /wʌ/.[12][13][14]

For more explanations on the differences and comparisons between English and Japanese Vocaloids see the conversion list: English - Japanese

TechniquesEdit

Due the way sounds are articulated by the synthesizer to simulate human speech, some phonologic phenomena also appears in the software (like the coarticulation). This allows the user to apply them to the software to increase the capabilities of the voicebanks.

Auxiliar PhonemesEdit

An array of Auxiliar Phonemes exist within the voicebanks, and these phonemes are used to get some effects (like breaths) or to alter the default pronunciation (like [Sil] which is utilized to break the diphone transition between two phonemes). It's important to consider that different auxiliar phonemes are present in the different versions of the software, and not all are available for the different voicebanks. As such, their effect or function may differ between the different voicebanks and versions of Vocaloid.

Coarticulation, Assimilation and Phoneme CombinationsEdit

An application of the coarticulation is combining phonemes to achieve new articulations, closer to the desired ones.

Examples:
  • Induce palatalization in an English Vocaloid singing in other language like Japanese or Korean (in the case of the palatalized consonants) or Romance languages (for the case of the palatal nasal).
  • Generate a similar TH (voiceless dental fricative) sound

Glides or SemivowelsEdit

The glides or semivowels are sounds that share traits of a vowel, produced with little or no obstruction of the airstream, but that are non-sibilant; in other words, not the main element of a syllable. If the user is aware of the glide and its respective vowel counterpart, s/he can utilize it in replace or along it vowel producing interesting results.

Some possible uses of the glides are:

  • Fix choppy vowel combinations
  • Facilitate some diphthongs or diphones
  • Replace the vowels when required.

Use of short notesEdit

An additional technique is the use of short notes (around 1/64 or 1/32 of length). When the note is too short the articulation will be incomplete, and the sound will blend with the next note. This technique is heavily affected by the tempo, however, and at lower tempos may not produce as efficient of results as at higher tempos.

This technique can be utilized for:

  • Improve the pronunciation of some consonant clusters.
  • Generate colored consonants.
  • Blend some phonemes.
  • Achieve new articulations.

Second Voice SupportEdit

It is also possible to use a Vocaloid with a similar voice type to hide the flaws of the phonetic mispronunciations of another by having the two Vocaloids sing in a duet. One such example, and one greatly acknowledged by fans, is that of Sonika and Luka.

Another use of this technique is when a Vocaloid sings in other languages than it is intended. If a Vocaloid sings in a duet or chorus with another of the intended language, this one will compliment the pronunciation of the first.[15]

A third technique, albeit somewhat pointless considering the nature of Vocaloid and what it's generally intended for, is to use a human singer to take the place of the second Vocaloid.

Post-Edition and Phoneme SlicingEdit

Besides all the tricks available in the editor, it's possible improve the pronunciation further more during the post edition. After rendering and export the WAV file, the user can edit it in any DAW or sound editor. If the pronunciation of a consonant is too soft or too strong the user can correct its volume.

Another technique that is possible to use on Vocaloids is phoneme slicing. This can be used on Japanese phonemes for Japanese Vocaloids, either in the Vocaloid software itself or the user's DAW. The length of the note is decreased or cut down, until only half the pronunciation needed for the spoken Japanese is heard (example "su" becomes "s"). However, this will affect the singing capabilities of the Vocaloid and the notes being cut have to be much longer than normal. Although this technique may be hard for new users and results in a lack of singing smoothness, it increases the chances of getting a closer match to the intended sound. This can also be applied to English capable Vocaloids. Additionally, software like Vocoder software can be used to artificially create or transform Japanese or English phonetics into those of another language.

Flaws in the Phonetic SystemEdit

There are some flaws that can limit a Vocaloids ability for language recreation and many of this issues are found in alllanguages and are not limited to one specifically.

The Vocaloid Engine's HabitsEdit

The Vocaloid system will attempt to sound out all data assigned to the phonemes used, even if that particular sound is not needed.

Yet a natural speaker may not sound out the needed sounds when they sing for various reasons such as a naturally slurred vocals, their localized accent, vocal disorders like stuttering or speech impediments such as a lisp. This restriction may limit the ability of a Vocaloid in regard to mimicking the language they are intended for. For example, the American English accents often involve the complete departure of the schwa vowel sound from words where it is featured. This sound is normally a prominent feature of the English language itself and present in British English accents.

The hidden Phonetic [Sil] will prevent this occurring and can be used with any Vocaloid language, even still this does not resolve all issues or scenarios.

Language StructuringEdit

Languages themselves have their own sets of rules that breaking are difficult.

For example in English and Japanese Vocaloids;

  • Japanese; Since Japanese Vocaloids do not have to blend their words like English ones and for having just 500 diphone sound to use, Japanese Vocaloids can produce choppier results than English Vocaloids when trying to be used for non-Japanese words, especially very different vocal languages such as English. Often when slicing phonetic information remains ("Su" becoming "s") a small fragment of the missing phonetic sound (in the case of "su" the missing "u" sound), leaving behind awkward vocal sounds that lower the quality of a Vocaloid's results. As a result of Vocaloid 3, voiceless sounds now make this a much easier attempt to do, but is still not a perfect solution to the problem. N' followed by a vowel may produce odd results, however, due to its use within the Japanese language there is no actual call for this phonetic to be followed by a vowel sound anyway so Vocaloid pocesses very limited data related to it. Japanese Vocaloids still have a very limited amount of vowels and in many cases the entirely wrong vowel sound needed for many non-Japanese words.
  • English; In the case of English Vocaloids, attempting to always blend their letters and for having 2,500+ diaphonetic sounds, depending on where the stress accent is will result in closer or more distant sounds to the intended target language. This can often make them complex to construct non-English words. The result is a reliance on [Sil] to prevent unwanted combinations can leave behind choppiness and robotic results, mixing between smooth results and sudden stops. Vocaloid 3 has the capabilities to make this easier to resolve and will soften such hard pronunciations anyway, but the sounds remain even though they are less apparent. Even still with their large selection of diaphonetic data they cannot be certain to say the right data even when needed and basic control of the diaphonetics may prove to result in random incorrectness.

In both cases, the language construction is the reason for the issue and if used for their own languages, the results will sound much natural and flow much easier. As noted in this section, due to the sheer number of things to take into account, English-capable Vocaloids can often be potentially far more complex (due to the problems presented by the English language) than the Japanese Vocaloids. Liberally interpreted, English Vocaloids have a greater language capacity than their Japanese cousins, in having more vowel and clearly separated consonant sounds, and are therefore easier to make sing in other languages, although both will only be using the equivalent or quasi-equivalent phonemes according to the set up of the phonetic system of either language. Japanese Vocaloids can often be far more simple to use, despite the more limited array of phonemes.

Sample Data BaseEdit

Despite all Vocaloids being made to produce a certain language there are some differences between them that effect performances. For example, Sonika has every Vowel combination needed for English, while Megurine Luka has missing diaphonetic data. The Vocaloid will still sound out the words "I love you" in both cases, but what the missing data could affect is the smoothness of the transactions. The results of the bad transactions is a fairly broken or robotic result that does not sound as natural as a it could be.

Another issue is the clarity of some Vocaloids. However, this is also a common issue when Vocaloids overall singing high notes (or alternatively low in some cases). The natural softness of their vocals dampens the strength of the vocal in the high notes. This normally occurs when a Vocaloid sings out of their optimum range (See Optimum Recommendations) but some Vocaloids have overall softness in their vocals. However, they are certain Vocaloids such as the original Kagamine Rin/Len package or Sonika who are said to be very difficult to get clear results from. Do note that equalising the singing results in a DAW or sound editing package can improve Vocaloids who lack clarity.

The Vocaloid DictionaryEdit

There are also a number of known words that have been used by English-capable Vocaloids that have more than one pronunciation of the word due to stress accents. However the user often fails to be able to separate the correct results from what the software gave them since Vocaloid can currently only store one pronunciation of the word in its dictionary. Without knowing how to sound out the alternative pronunciation, these words can be considered a problem to non-native English speakers;

  • Wind
    • The wind blew (IPA: [ˈwɪnd]; Vocaloid: [w I n d])
    • you wind me up (IPA: [waɪnd]; Vocaloid: [w aI n d])
  • Read
    • I will read the book (IPA: [riːd]; Vocaloid: [r i: d])
    • I read the book (IPA : [rɛd]; Vocaloid: [r e d] )
  • Tear
    • You have a tear in your eye (IPA: [tɪə]; Vocaloid: [t I@])
    • The paper has a tear in it (IPA: [tɛə] ; Vocaloid: [t E@])
  • Bow
    • You must bow before royalty (IPA: [baʊ]; Vocaloid: [b aU])
    • I tie a bow in my hair. (IPA: [bəʊ] or [boʊ]; Vocaloid: [b @U])
  • Live
    • The show was broadcast on TV live (IPA: [laɪv]; Vocaloid: [l0 aI v])
    • I know where you live (IPA: [lɪv]; Vocaloid: [l0 I v])

Spanish Vocaloids also use stressing for some of the data, so this feature is not unique to English Vocaloids, but it absent from many Asian languages including Japanese.

Note LengthEdit

Vocaloids sometimes have difficulty pronouncing words. For example, Prima and Tonio struggle with the middle section of the word "together" if the middle section is too short when you spread the word out over several notes ("to-geth-er" becomes "to-g'-er" if "geth" has no room). Some Vocaloids singing results may impacted if a user does not consider this. When a Vocaloid fails to pronounce a phonetic that it should be able to there are ways around this. You can move the phonetic data onto another track, increase the "accent" (attack) in note properties, or change the length of the note to allow the vocal room to pronounce the words. VY2 also has a similar weakness: the phonetics a with re becomes a ge sound, but this is fixed by dividing the tracks, breaking the transition with [Sil] or modifying the tone of the voice.

Optimum RecommendationsEdit

Many Vocaloids also come with optimum range. The recommendations are to help direct Producers to the best range for the Vocaloid, as well as describe what vocal range the Vocaloid has (Soprano, Mezzo-Soprano, Tenor, Alto, etc). When hitting the high notes above the Vocaloid's capabilities, they may become muffled and lack clarity, while many low notes can be soft and quiet. Working within the optimum range increases the chance of clearer and more stable language skills of the Vocaloid.

Likewise, optimum Tempo helps the Producer to know what range will leave the Vocaloid sounding most natural; too fast may not give the Vocaloid time to sound out the sounds correctly, resulting in digital noise in place of natural smooth pronunciations or missing sounds. In the opposite direction, too slow can make any digital defects more apparent by allowing them to be heard much clearer. The engine version will also affect the results in different ways, with Vocaloid being more criticized for its heavy digital sounds than Vocaloid 3.

User related concernsEdit

One of the issues related specifically to the user is that they may not be able to use a Vocaloid from a language they don't know particularly well. What may sound flawless and realistic to a person who has little knowledge on a language, is actually full of bugs and glitches. A speaker of that language can hear the Vocaloid's flaws much better then someone who knows little on the language. This issue has can easily occur in even the most well tuned Vocaloid songs and can often add a kink to a otherwise perfect example of a Vocaloid's best singing results.

Even if one were to take a VSQ or VSQX file that had been tweaked by another user, even those that are a native speaker, not all Vocaloids have the same strengths and flaws. Therefore, it is vital that users take time to study even the basics of the language structure they are working with and further more spend time comparing results for every song they produce, even if there is already pre-tweaking on the VSQ and VSQX file.


Additional HelpEdit

Also note, both Zero-G and PowerFX also have tutorials of their own.

  • How To Make a Vocaloid Breathe Using VOCALOID: Explanation on how some of the Japanese Vocaloids sound when you use the breathing effects
  • Comparative Table of English and Japanese Phonetic System of Japanese and English Vocaloids, including notes on if the vocaloid has this phoneme. List also includes information on how to transform the quasi- equivalent phonemes in Japanese and English into the opposite language effectively.
  • Vocaphonetic: A Japanese community site for creating and distributing Japanese dictionary data for English Vocaloids to sing better in Japanese. The dictionary data for Vocaloid and Vocaloid2 are respectively available.
  • Vocaloid Phonetic Library - a quick look up guide for Phonetics of all Vocaloids.
  • From English to Japanese - Using Tonio, this is the instructions for how Japanese users can make Tonio sing in Japanese. Also shown is how close to and how much of the Japanese language Tonio can reproduce.
  • Tutorial - here you see a tutorial showing a user making Miku sing in "english" Japanese phonemes.
  • Making Big-Al sing Japanese

TriviaEdit

  • One of the reasons for the large length of time between Vocaloid releases for english Vocaloid is owed to the length of time consumed in recording the phonetic samples (estimate; 2,500 samples needed for English vs 500 for Japanese per each pitch). It took 25 hours (4 hours a day) to record all the Kagamine "Appends".[16] Camui Gackpo's Vocaloid 2 voicebank was confined to have been completed within 4 hours, plus a later additional voicebank was recorded for alternative samples.[17] In contrast, according to Anders, it takes anything from 1–3 weeks onwards to record a single english voicebank.[18]
  • The more samples involved in making a synthesized voice the harder it is to maintain quality and the lack of smoothness of older synthesizing software voicebanks can often reflect the difficulty it presents.
    • More complex languages such as English struggle much more to maintain quality while singing due to the sheer number of samples involved.
    • This is also why older voicebanks may be harder to use such as the vocaloid voicebanks. For instance, "now" is often pronounced as "no-ow" by the English Vocaloid voicebanks. In contrast, Vocaloid 2 voicebanks have no problems with this word.
  • Some fans struggle to understand how synethized vocals have developed over a single decade and do not understand why Vocaloid results are as they are. Here are Microsoft Mike, Mary, Sam and Ann, speaking (mature Content) showing the various stages of this particular software and progression the vocals for the Microsoft text-to-speech voices software. Vocaloid was released soon after this software was being developed, yet are much more advance software packages, but there are common problems shared between all synethizing software packages.
  • Studies of the brain prove that if the words are close enough to the intended words they are suppose to be when spoken the mind is capable of working out, or attempt to work out, what they actually are even if the actual words spoken are gibberish. This plays a role in the matching of phonetics from one language to another, and can make the mind believe that a word sounds closer to the intended word than it is.

ReferencesEdit

  1. link
  2. [1] - Vocaloid.com - Vocaloid1#Characteristics of VOCALOID (2004)
  3. http://www29.atwiki.jp/vocalo-gojokai/pages/105.html Vocaloid Gojokai
  4. http://doku.bimyo.jp/miku/page03/index.html VOCALOID Introductory: Control Track
  5. http://www39.atwiki.jp/vocaloid/pages/32.html VOCALOID@wiki How to Edit Rin/Len Kagamine
  6. [2] Vocaloidism - Vocaloid 3 Update, v3.030
  7. Vocaloid document
  8. [3] Vocaloid Non Sense - How To Make a Vocaloid Breathe Using VOCALOID
  9. [4] Nicovideo - Big Al’s breathing phonemes
  10. [5] Wikipedia: Phoneme
  11. NND: nm7051391 - Jutenija sung by Kagamine Rin / Len
  12. http://ww3.enjoy.ne.jp/~koti/kaito/miriam.html Making Miriam Sing in Japanese
  13. NND: sm10379602 - Lost Sheep sung by Miriam
  14. NND: sm4916135 - Lost Sheep sung by KAITO
  15. NND: sm10037931 - Unbalance sung by Kagamine Rin
  16. link
  17. link
  18. [6] VocaloidOtaku - Somebodyrandom's Questions

Please note we are waiting for more information on some languages

Navigation

Start a Discussion Discussions about Phoneme List

Around Wikia's network

Random Wiki