Vocaloid Wiki
Advertisement
Englishphonetics

English Phonetics in use on the VOCALOID piano roll system

English VOCALOIDs are VOCALOIDs that are capable of mimicking the English language much easier than VOCALOIDs of other languages. The following is a list of phonemes needed to make an English VOCALOID sing in English.

About

The English language has one of the greatest variations of dialect in the world. Thus, there is much more variety of pronunciation for English VOCALOIDs than VOCALOID such as those that sing in other languages.

The English language itself is made up of about 20 vowel sounds and 24 consonant sounds. Also, English doesn't have a systematic orthography, so there is not a one-to-one or near one-to-one match between letters and sounds as it happens with other languages like the Spanish or Japanese.

Example. "W" can sound /w/ in "what" and /u:/ in "few", "Y" can sound /j/ in "yes" and /i/ in "play".[1] There are also differences between spellings of words, such as those seen in British and American spellings of words such as "colour/color".

VOCALOID and VOCALOID2, uses American spelling for the lyrics. VOCALOID3 is confirmed to be capable of localisation, but it is unknown if it will open up the ability to have American and British spelling.

However the phonetic notation doesn't follow this, and instead uses the Received Pronunciation written in X-SAMPA, with some minor modifications when it's required, like its the case of the allophones.

During the VOCALOID2 era it was also confirmed that, in contrast to Japanese voicebanks, English voicebanks needed their samples cut at a length of more than 0.5 seconds on many sounds. This was a longer sample length than their Japanese cousins were being cut at for the software. If not done so, their vocals had a habit of cutting out when used for short notes.[2]

English Scripts

The reclist scripts used for English VOCALOIDs also has been confirmed to have an impact on the way an English VOCALOID sounds (shortened to "enlist"). The scripts are the list of sounds a studio has to record in order to obtain all the sounds essential for successful English creation. A bad script results in more errors being present in the VOCALOIDs recordings.

For those not familiar with the English reclist script at the vocals time of recording, regardless of the version used for recording, it can be a challenge. This was noted when Saki Fujita (a veteran of Japanese Vocaloid script reading) faced it for the first time as she thought she would have to relearn script reading from scratch. The reason for this is because the script used is very different in comparison to ones used for other languages such as the script for Japanese.[3]

According to the developer's notes, in regards to CYBER DIVA, the VOCALOID engine itself uses a combination of both British and American phonetic sounds. The result is that sometimes certain sounds may sound off because that particular combination would not typically be used together by either a British or American accented speaker.

Original Yamaha Script

The VOCALOID English script used prior to VOCALOID4 was confirmed to have contained errors, and thus VOCALOIDs, such as YOHIOloid, have incorrect pronunciations. This is important to note that it is a common occurrence for pre-Vocaloid4 that when the combination of symbols are entered into the editor, unexpected sounds may occur.

Due to their differences, the majority of the pre-VOCALOID4 Vocaloids will not produce the same results as post-VOCALOID4 because of the issues with this script alone.

Many also lack the schwa sound despite the phonetic symbol still being registered by the engine and there was also an "Aspiration Problem".

Cyber Diva Script

Upon the development of the CYBER DIVA vocal a number of issues were noted that had existed and were finally addressed resulting in the base Yamaha script being improved.

A subtle differences between the old Yamaha English Dev Kit script and the Cyber Diva script is that the newer script produces less expressive tones then the older script, as it focuses on obtaining more clarity per sound.

The Cyber Diva script also fixes the "Aspiration Problem" and includes the Schwa sound recording.

Ruby Script

Ruby also uses a new script that was created by Syo. Once again, the creation of the new script was owed to the errors contained within the previous YAMAHA script and Ruby shows improvements over VOCALOIDs like YOHIOloid. This script is written focusing on the American accent.

Part of the reason for Ruby having a different script than CYBER DIVA is that the improved script used for CYBER DIVA hadn't been shared at that time.[4]

The aspiration issue is also fixed in this script.

Dex/Daina Script

In June 2015, Syo also revealed he had created another English script for Zero-G Limited's two American Vocaloids Dex and Daina, which similar to Ruby's but had different lyrics.[5]

This script also focuses on the American accent.

Cyber Songman Script

CYBER SONGMAN was recorded with a brand new phonetic script developed personally by the lead developer of the project, Michael Wilson. This new script is an update of CYBER DIVA's, and according to Wilson, tests proved that it was easier to read and pronounce, which increases the clarity in the pronounciation while mantaining a natural, expressive sound.

Notes on Accents

Despite the general belief that singers lose their accents when they sing, this is not the case and an accent is possible to be heard even in singing vocals. However, the reason many are led to believe this is that there are several methods of training singers to disguise or otherwise hide their natural accents - they may even adopt a accent that isn't their own for singing.[6]

Though the English language is not alone in the problems of accent as other languages may suffer from this same problem, English VOCALOIDs have proven to be difficult to avoid issues with accents. Even the first two VOCALOIDs in English, LEON and LOLA, were noted their distinctly "British" accent. The result is that the accent has been known to aid or add difficulty to the use of synthesizing software and VOCALOID is no stranger to this effect. English VOCALOIDs have ended up with the most variation on how they sound out of all the current languages offered for the VOCALOID software so far produced.

The impact of the dialect/accent on English VOCALOIDs can result in a notorious variation of certain sounds, being notorious in the case of the diphthongs and rhotic vowels. Users who are not aware of the potential difficulty of accents may overlook odd pronunciations that need to be adjusted for better results. This is true for non-native based accents voicebanks more so, due the voice provider may have pronunciation issues with a non-native language.

In some instances, Producers may be found to have adjusted VSQ and VSQx files so heavily to make them work for 1 particular English VOCALOID that they become "VOCALOID specific" and are unable to work particularly well without further adjustments on other English VOCALOIDs. Cases like this are often rare in languages such as Japanese, though not foreign to them and many VSQ and VSQx files will work without too much adjusting.

Native Accented

British-English Accented

British-English accented VOCALOIDs were VOCALOIDs whose provider was known to have been of "British" nationality. As Great Britain is the main origin of English, British-English VOCALOIDs sing in a native English accent. Originally, they were the standard English accent type used to develop the English engine. British accented VOCALOIDs mostly came originally from Zero-G who worked solely with British artists to collect their vocal samples from.

Note: The term 'British' applies to anyone from England, Scotland, Wales and Northern Ireland and therefore the variation of the accent can differ greatly overall. The British Isles have the greatest variation of accents for English in the world per sq. mile of land. (For more information see Wikipedia.)

  • LEON
  • LOLA - (Note: though she is regarded as having a "British" accent, Lola's accent reverts to her provider's natural Caribbean accent when not singing in ideal Soul music conditions.)
  • SONiKA
  • OLIVER
  • AVANNA

American-English Accented

American accented VOCALOIDs have providers that came from the United States of America, and for this they are native speakers of the English language. The most notorious difference with the British accented voicebanks is in the rhotic vowels.[7] This is because the British dialects usually are non-rhotic; in North America rhotic dialects of the English are predominant.[8] (For more information see Wikipedia.)

Due to the user base preference for this accent, PowerFX have confirmed since that YOHIOloid's vocal was made to have a American sounding ring to it. Hatsune Miku English also was made to match the American way of speaking by Crypton Future Media.

Australian Accented

Australian Accents are the normal English accent for individuals from Australia. This particular accent is normally very distinct compared to all other English accents, with features unique from all other English dialects. (For more information see Wikipedia.)

  • Sweet ANN - Her provider "Jody" supposedly came from Australia.

South-African Accented

South African accents are accents belonging to individuals from South Africa. English was not a native language to Africa and was introduced during the colonisation of African countries by English colonist, resulting in the English language becoming widely used in South Africa itself as the general Lingua franca between regions. Variation in impact of native languages on the English language results in a large variation of strength and tone of the accent, though in general most South African accents resemble closely to South England accents in nature. (For more information see Wikipedia.)

Non-native Accented

Japanese-English Accented

Japanese-English accented VOCALOIDs are produced by those who came from Japan. Their voice providers have the Japanese language as their native language, but were used to produce English voicebanks. Therefore the Japanese-English accent is a non-native English accent, showing significant and notorious differences in comparison to the native English accents. As more releases of such voicebanks have been produced by studios, common traits that are clearly able to be picked out amongst these vocals.

The major issue seen with Japanese accents is that they often struggle with distinction of some sounds. This usually happens because the providers and producing studios/companies aren't familiarized with these foreign sounds. Among the most common issues are:

  • Lack of distinction and stress in vowel sounds. These ones usually are either too tense or too lax, as the speaker tends to approximate the vowel sound to their 5-vowel system.
  • Lack of distinction in the liquids consonants (R & L). Luka's use of English to pronounce the words "Road Roller", which risks coming out as sounding like "roe rorora", is the most famous case.
  • Distortion of some sounds toward similar Japanese sounds. As example, the [f] phoneme pronounced as a voiceless bilabial fricative instead a voiceless labiodental fricative, as it should be.

These traits depends of the providers efficiency in English and the experience of the studio/company with the language. Despite this, Japanese-accented English VOCALOIDs still are a better option for mimicking the English language than use purely Japanese voicebank, having the wide array of phonemes and work-arounds available from the English phonetic system.

Korean-English accented

Korean-English accented VOCALOIDs are produced by those who come from South Korea. As there is only one unreleased VOCALOID voicebank with this accent, details cannot be released.

SeeU's Korean voicebank is a special case as it was given English phonemes to mimic the language to certain degree. However, this feature was left largely incomplete due to deadline issues and again this does not produce quality results enough to comment on.

  • SeeU - An English Voicebank was set for production but is currently on hiatus as of Feb 2013.

Misc.

  • Prima - Accent unconfirmed
  • Tonio - Accent unconfirmed

Custom Dictionaries

More information on dictionaries can be found on Phoneme List.

English VOCALOIDs rely on the VOCALOID editor dictionary greatly due to the language's lack of a systematic orthography. Custom dictionaries can take advantage of the large array of English sounds found within VOCALOID to improve the way they sound, by using different combinations of sounds or by creating a accent/dialect to appear by default. This is not isolated to English vocals, but has been known to impact them greatly at times.

Be aware that the language is full of examples of homonyms that take the form of homographs (a word that has the same spelling as another word but has a different sound and a different meaning; such as "bow", "minute" and "tear") or homophones (a word that has the same sound as another word but is spelled differently and has a different meaning; such as "pair"/"pear" or "bare"/"bear") or both. Vocaloids dictionary has limitations that make such words difficult to record within it, at times users may simply have little choice but to write the word phonetically rather then lyrically.

Note that if a user creates lyrics via phonetic entry rather then written text, they will not have to consider dictionaries at all.

Megurine Luka

With the initial release of Megurine Luka, Crypton released a custom dictionary for Luka which could be downloaded from their site. This dictionary included support of Japanese characters and the names of other Crypton VOCALOIDs.[9]

Post VOCALOID3

VOCALOID3 English vocals were given a new dictionary. This was said to "improve" the way English Vocaloids sounded.[10]

Megpoid English

Internet Co., Ltd. provided a custom dictionary for Gumi's Megpoid English vocal. This was done to avoid certain problematic combinations that were known to the vocal. Without this script, Gumi naturally has errors that will be encountered, such as skipping of sounds or incorrect sound combinations.[11]

Avanna

NeutrinoP made a note that Avanna has her own dictionary. This was created to make room for large arrays of accents.[12]

CYBER DIVA

CYBER DIVA was created with a new script for VOCALOIDs. With this script, YAMAHA created a new custom dictionary for the vociebank with new words that weren't available before and more natural pronunciations.

Ruby

Including the 300 most common words, Syo confirmed that Ruby knew over 5,900 words.[13] 100 of these words were randomly chosen.[14] Ruby was also set up to pronounce some words such as "fire" and "hour" in one syllable.[15][16][17]

Syo's twitter account lists many of Ruby's dictionary word adaptations and added words.

CYBER SONGMAN

CYBER SONGMAN's dictionary was an update of his counterpart's. It also makes use of his extra phonemes [4] and [@l]. While [4] was given to various third-party VOCALOIDs, the latter is currently exclusive to SONGMAN.

Phonetic System's Characteristics

There are 52 phonetic pronunciations which make up the English VOCALOID library, these phonetic inputs will use any set of the estimated 2500 samples per pitch.[18] According to development notes on Megpoid English, there were over 4,000 phonetic connections for that particular vocal alone;[19] a similar number is therefore likely for all English VOCALOIDs.

Vowels

The English phonetic system includes 3 types of vowels: monothongs, diphthongs and R-colored vowels. Being the nucleus of the syllable, the vowels can be encoded alone

The English phonetic system includes 10 vowels of the 11 monophthongs or pure vowels of the English Language, missing the phoneme /ɑː/ or open unrounded vowel.

The pronunciation of some vowels may change slightly, depending on the dialect or the way the VOCALOID was recorded.

  • Example: OLIVER's [{] phoneme has been reported to sound more like an /a/ than an /æ/.

Also the target or optimal musical genre of the VOCALOID can affect the pronunciation of the vowels.

  • Example: Tonio & Prima had been reported to have an "opera"-like pronunciation of the vowels, more fitting for romance languages than standard English. This probably is attributed as they're Opera-specialized voicebanks.

The English phonetic also includes an array of 5 diphthongs or gliding vowels : 3 y-colored diphthongs and 2 w-colored diphthong. The diphthongs behave as a single vowel, despite the glide at the end of them.

It's important consider the diphthongs, like the monothongs and the rhotic vowels, can vary their pronunciation, depending the dialect, recording and stress of the word.

  • Example: The diphthong [eI] can be pronounced with different degrees of stress, being realized either as [eː] (unstressed monothong), [eɪ] (diphthongized [e], lax glide), [ei] (diphthongized [e], tense glide) or [ej] (diphthongized [e], short tense glide). BIG AL is known to vary noticeably the pronunciation of this phoneme according to context.[20]

The English phonetic also includes including 6 r-colored or rhotized vowels. These ones are used mainly used for the vowel + R combinations. These vowels are modified by the R that follows them, incorporating to them and forming a single unit, as it's in the case of the diphthongs.

Like the diphthongs, these ones tend to vary in their pronunciation, especially if the voice provider has a rhotic accent or not.

  • Example: Depending the speaker's dialect and context of the sound, the VOCALOID phoneme [I@] may be realized as [ɪː] (non-rhotic, long vowel), [ɪə] (non-rhotic, schwa diphthong), [ɪɚ] (rhotic; r-colored schwa diphthong), [ɪɹ] (rhotic; vowel-consonant), etc.

The diphthongs and rhotic vowels tends to cause some problems for the user when they need to be extended across 2 or more notes if this one attempts to do it manually.[21]

For work around this, the English voicebanks allows split the words in syllables across the notes using the hyphen symbol "-" within the lyrics.

  • Example:
    Remember split


while in the case of extend a syllable across various notes is required a combination of hyphen '-' and slash '/' within the lyrics for state how many note will it last.

  • Example:
    Sound extend V2


In VOCALOID2's case, is obligatory use the hyphen/slash for effectively divide the words across the notes, unless the user prefer take the risk, working around this manually using phoneme replacement.

In the case of VOCALOID3, the task is easier as the [-] phoneme allows extend any kind of vowel it follows. The hyphen/slash still works, however this one simply adds the [-] phoneme when is required.

  • Example:
    Sound extend V3


Consonants

The Phonetic System also includes 31 consonant phonemes. From the English consonants only the plosives and the liquids have their allophones as their own phonemes, these ones are required for achieving a correct stressing and pronunciation of the words.

Allophones

Plosives and aspirated allophones

Because it's an important element of consonant stress within the language, the English phonetic system makes distinctions between with normal plosives and their aspirated allophones.

The aspiration is the strong burst of air that accompanies at the release of of some obstruents.

In the English language, the plosives [b], [d], [g], [p], [t], [k] became aspirated at the beginning of the words or at the beginning of a stressed syllable

  • Example: The word 'potato' is aspirated in two consonants:The initial P, because it's the beginning of the word; and the middle T, because it's a stressed consonant.

In 'International Phonetic Alphabet' the aspirated phonemes are indicated by a small superscript ‹h›, as with /kʰ/ for a aspirated /k/, while in VOCALOID's English phonetic system the aspirated phonemes are distinguished from their standard versions due to the addition of a h which represents the IPA's small superscript ‹ʰ›.

The English Phonetic system includes an array of 3 to 4 liquid consonants. These ones includes to both English's allophones of the L. The English R usually is used at the beginning of the syllables, as the 'R's after a vowel, are included in the R-colored vowels.

Additionally, it can include the non-native English phoneme, the Rolling R. This one is mainly used for loan words, for sing in other languages or for some particular genres as the case of the opera.

Dark L and Clear L

The system includes both allophones for the L in the English, the [l0] or alveolar lateral approximant, also known as Clear L (used at the beginning of the syllables); and the [l] phoneme or velarized alveolar lateral approximant, also known as Dark L (which is used at the end of the syllables).

These phonemes aren't designed to be encoded alone; however, the [l0] seems to handle better to be reproduced without a vowel in comparison to the [l] phoneme. The former results in audio loop, while the latter generates electronic buzzing or doesn't produce sound at all without a vowel. The only exception to this was Megurine Luka, which her [l] phoneme behaves as a syllabic consonant, so it can be used alone and extended without suffering distortion.[22]The lack of proper syllabic Dark L was a minor issue that finally was adressed with the release of CYBER SONGMAN, which it included a its own phonetic symbol [@l] for said allophone, allowing a more colloquial pronunciation if the user requires it.

Rolling R

Although it isn't a native phoneme of the English language, the alveolar trill or rolling R was included to the English phonetic system to increase the Opera singing capabilities of Prima. After this, it became a common phoneme in the VOCALOID2's English voicebanks released after Prima (with exception of Luka).[23] However, its addition to the VOCALOID3's English voicebanks seems to be deprecated.

Nonetheless, the performance of this phoneme may vary between different English VOCALOID. For example, it is known that BIG AL is capable of using it only at the end of words and requires some techniques and further edition to use it in the beginning or middle of a word.

The symbol which represents it in the English Phonetic System is the phoneme [R].

Phonetic List

Special note: This was the list is based in the Big Al's help file, complimented with the chart of VOCALOID-User.Net[24] and expanded to include the IPA's symbols and names. However there were some incorrect entries within the released list. Entering some of the words provided here as examples for the phoneme usage will not result in the expected phonemes that were used for the list. In addition, the list did not indicate which particular letters the phoneme applied to; this section has underlined the relevant letters for the benefit of readers.

Symbol Classification IPA's Symbol / Name Sample Notes Related Phonemes
[@] vowel ə schwa aware, synthesis, harmony, the In the VOCALOID program, it is not actually used by itself but rather with other phonetics. However, Luka can use this phoneme to make a the "a" sound in aline

[V] (stressed)

[@r] (r-colored)

[Q@]

[V] vowel ʌ open-mid back unrounded vowel strut, unclean, cut,
duck
Actually it's an /ɐ/ in various most of the dialects. Despite this, the notation /ʌ/ still is used for tradition and because some dialects still retains the old pronunciation.

[@] (unstressed)

[{] (fronted)

[Q@] (r-colored)

[e] vowel ɛ open-mid front unrounded vowel them, egg Usually transcribed as /e/ by the AHD

[e@] (r-colored)

[eI] (diphthongized)

[I] vowel ɪ near-close near-front unrounded vowel kit, it, synthesis

[i:] (tense)

[I@] (r-colored)

[i:] vowel close front unrounded vowel beef, eat, harmony

[I] (lax)

[I@] (r-colored)

[{] vowel æ near-open front unrounded vowel trap, axe In some dialects, it may be diphthongized into /eə/ or similar due æ-tensing.

[aI] (diphthongized)

[aU] (diphthongized)

[O:] vowel

ɔː open-mid back rounded vowel

taught, ought, ball This vowel has a lot of variations depending on the dialect. In US dialects it varies between /ɑ/ for the cot–caught mergers and /ɒ~ɔ/ for the rest.

[Q] (lax)

[O@] (r-colored)

[Q] vowel ɒ open back rounded vowel lot, off

[O:]

[OI] (diphthongized)

[U] vowel ʊ near-close near-back rounded vowel put, look

[u:] (tense)

[U@] (r-colored)

[u:] vowel close back rounded vowel boot, view

[w] (semivowel)

[U] (lax)

[U@] (r-colored)

[@r] rhotic vowel

əɹ, ɚ or ɝ (US)

ɜː (UK)

urge, bird, marker r-colored schwa

[@] (non-rhotic)

[V]

[eI] diphthong eɪ̯ pay, age, date j-colored /e/ [e] (monothong)
[aI] diphthong aɪ̯ buy, eye, died j-colored /a/

[@]

[V]

[{]

[OI] diphthong ɔɪ̯ boy, oil, choice j-colored /ɔ/

[Q]

[O:]

[O@]

[@U] diphthong

oʊ̯ (UK)

oʊ̯~o (US)

oat, soak, show w-colored /o/. Usually transcribed as /əʊ̯/ or /oː/ [@]
[aU] diphthong aʊ̯ loud, out, cow w-colored /a/

[{]

[Q]

[I@] rhotic vowel

ɪə (UK)

i(ə)ɹ (US)

beer, ear r-colored /ɪ/

[I] (uppercase i)

[i:]

[e@] rhotic vowel

ɛə~ɛː (UK)

ɛɹ (US)

bear, air, aware r-colored /ɛ/ [e] (non-rhotic)
[U@] rhotic vowel

ʊə (UK)

ʊɹ (US)

poor, surely r-colored /ʊ/

[U] (non-rhotic)

[u:] (non-rhotic)

[O:] (non-rhotic)

[O@]

[O@] rhotic vowel

ɔː(ɹ) (UK)

ɔɹ~oɹ (US)

pour, sort r-colored /ɔ/

[O:] (non-rhotic)

[Q] (non-rhotic)

[Q@] rhotic vowel

ɑː(ɹ) (UK)

ɑɹ (US)

star, are, harmony r-colored /ɑ/

[@]

[V]

[w] consonant w labio-velar approximant way

[u:] (syllabant)

[U]

[j] consonant j palatal approximant yellow

[i:] (syllabant)

[I] (uppercase i)

[b] consonant b voiced bilabial plosive cab

[p] (voiceless)

[bh] (aspirated)

[bh] consonant aspirated voiced bilabial plosive big at the beginning of syllable, /b/ with aspiration

[ph] (voiceless)

[b] (deaspirated)

[d] consonant d voiced alveolar plosive bad

[t] (voiceless)

[dh] (aspirated)

[D] (lenited, lowered)

[dh] consonant aspirated voiced alveolar plosive dog at the beginning of syllable, /d/ with aspiration

[th] (voiceless)

[d] (deaspirated)

[D] (lenited, lowered)

[g] consonant g voiced velar plosive bag

[k] (voiceless)

[gh] (aspirated)

[N] (nasalized)

[gh] consonant aspirated voiced velar plosive god at the beginning of syllable, /g/ with aspiration

[kh] (voiceless)

[g] (deaspirated)

[dZ] consonant ʤ voiced postalveolar affricate jeans

[tS] (voiceless)

[Z] (spirantizated)

[d] (deaffricated)

[v] consonant v voiced labiodental fricative vote [f] (voiceless)
[D] consonant ð voiced dental fricative their

[T] (voiceless)

[d] (fortited)

[dh] (aspirated)

[v] (Th-fronting)

[z] consonant z voiced alveolar fricative resort

[s] (voiceless)

[Z] (palatalized)

[Z] consonant ʒ voiced postalveolar fricative Asia

[S] (voiceless)

[z] (depalatalized)

[dZ] (affricated)

[m] consonant m bilabial nasal mind

[n] (alveolarized)

[n] consonant n alveolar nasal night

[N] (velarized)

[m] (labialized)

[N] consonant ŋ velar nasal long [n] (develarized)
[r] consonant ɹ alveolar approximant red The /r/ is the symbol for the alveolar trill or rolling R for the IPA and the X-SAMPA, the symbol in this case seems be based on AHD

[R] (rolled)

[w] (gliding)

[l] consonant ɫ velarized alveolar lateral approximant feel Dark L, at the syllable coda position

[l0] (develarized)

[w] (L-vocalized)

[u] (L-vocalolized)

[U] (L-vocalized)

[l0] consonant l alveolar lateral approximant list Clear L, at the beginning of syllable

[l] (velarized)

[p] consonant p voiceless bilabial plosive dip

[b] (voiced)

[ph] (aspirated)

[ph] consonant aspirated voiceless bilabial plosive peace At the beginning of syllable, /p/ with aspiration

[bh] (voiced)

[p] (deaspirated)

[t] consonant t voiceless alveolar plosive sit

[d] (voiced)

[th] (aspirated)

[th] consonant aspirated voiceless alveolar plosive top At the beginning of syllable, /t/ with aspiration

[dh] (voiced)

[t] (deaspirated)

[k] consonant k voiceless velar plosive rock

[g] (voiced)[kh] (aspirated)

[kh] consonant

aspirated voiceless velar plosive

kiss At the beginning of syllable, /k/ with aspiration

[gh] (voiced)

[k] (deaspirated)

[tS] consonant ʧ voiceless postalveolar affricate touch

[dZ] (voiced)

[S] (spirantizated)

[t] (deaffricated)

[f] consonant f voiceless labiodental fricative feel [v] (voiced)
[T] consonant θ voiceless dental fricative think

[D] (voiced)

[s] (Th-alveolarization)

[f] (Th-fronting)

[s] consonant s voiceless alveolar fricative sea

[z] (voiced)

[S] (palatalized)

[S] consonant ʃ voiceless postalveolar fricative share

[Z] (voiced)

[tS] (affricated)

[s] (depalatalized)

[h] consonant h voiceless glottal fricative hat

Additional phonetics

The following is a list of additional complementary phonemes avaible within some of the English VOCALOIDs. Most of them are allophones and it's possible to use the voicebank without having to ever touch these set of data. However, use of them within a song can improve the pronunciation and the Vocaloid's ability to sound more colloquial. In most of the cases, the data has to be entered manually through the note properties selection.

Symbol Classification IPA's Symbol / Name Sample Notes Related Phonemes Applies to
[e@0] diphthong [ɛə~eə~æ] man, land Tense allophone of /æ/, often diphthongized (/æ/-tensing)

[{] (allophone)

[e]

RUBY, DEX, DAINA
[4] consonant ɾ alveolar flap better Unstressed allophone of /t/ or /d/ phonemes (Alveolar Flapping)

[t], [d] (allophone)

[R] (trill)

[r] (approximant)

RUBY, DEX, DAINA, CYBER SONGMAN
[R] consonant r alveolar trill tierra (earth)

Rolling R. Generally used in non-English words

[4] (tap)

[r] (approximant)

Prima, SONiKA, Big Al, Tonio, RUBY
[@l] consonant ɫ̩ syllabic alveolar approximant apple, awful Syllabic allophone of the Dark L [l] (non syllabic) CYBER SONGMAN
[h\] consonant ɦ voiced glottal fricative behind Possible allophone of /h/ between voiced sounds [h] (voiced) RUBY, DEX, DAINA

Techniques

Phoneme Replacement

Due the big array of allophones and similar sounding phonemes available in the English Language, exists a great flexibility for replacing the phonemes. This has a lot of applications, like altering the emphasis or stress of a word, correcting a strange pronunciation found in a voicebank,[25] alter the accent or general pronunciation of a particular VOCALOID,[26] etc.

This added to some auxiliar phonemes allows a great diversity of combinations and possibilities to experiment. However, the user must consider the results may vary between the different voicebanks due the individual differences like accent, pronunciation and samples' quality present in the voicebank. The most recommended is take these tips as a guide and experiment by yourself.

For the consonants is possible:

  • Replace the plosives for the respective aspirated allophones. If a consonant sounds too strident or too weak, it's possible to replace it with the corresponding allophone. However is important it may affect the stress, as the aspiration is related to it.
  • Swap a consonant for its respective (un)voiced counterpart.
    • This applies specially well for the end of the syllables, where the coda consonant is prone to assimilate the voicing of the neighbor phonemes.
    • This is harder for a onset consonant (beginning of a syllable), as the voicing can alter the meaning of the word; however still is possible if the user is careful with the consonant length. As the consonant lenght  becomes shorter, it's harder distinguish it's voicing.
  • Replace the alveolar plosives, [t] & [d] by their respective postalveolar affricates, [tS] & [dZ].
    • This often occurs when the alveolar plosive is palatalized by a nearby phoneme.
      Example: 'Don't you' /doʊnt.juː/ → 'Don't ya' /doʊn.jə/ → 'Don't cha' /doʊnə/
    • Similar to the voicing swap, this replacement also is possible when the consonant length and stress somewhat neutralizes the differences between both phonemes.
  • The Dark L is prone to series of phonological processes and sound changes. Taking these ones into account, it's possible replace the [l] accordingly.
    • In the L-vocalization process, where the Dark L is prone to be warped into a close back vocoid, it's possible to replace the [l] phoneme with the [O:], [@U], [U], [u:] or [w] phonemes.
    • Similar previous case, the vowels before the Dark L can be coloured by the velarized lateral consonant. Simplifying, the front vowel tends to become more centralized meanwhile the central and back vowels tends to shift to a close back vowel. Knowing this sound change, is possible replace or insert another vowel before the Dark L in case the phoneme combination sounds awkward.
      Example: [i: l] → [I l] → [@ l]; [V l] → [@ l] → [U l]
    • Also an unstressed vowel before the Dark L can be completely ommited, leaving a naked syllabic L.
      Example: The word 'Twinkle' actually it's pronounced as /ˈtwɪŋkl̩/ instead /ˈtwɪŋkəl/.
      Previously there wasn't a way to imitate this in the synthesizer, as few voicebanks had a syllabic [l] phoneme that could be produced on their own, without a vowel. In general, this was patched adding a vowel like [V] or [U] before the Dark L (Example: Twinkle [th w I N k][U l]), however in some cases this sounded overpronounced. This was fixed with the release of Cyber Songman, which included his own [@l] phoneme for the syllabic L, allowing produce a more cololloquial pronounciation if the user requires it. Example: Twinkle → [th w I N k][@l l].

Monothong Replacement

The English phonetic system has one biggest number of available vowels among the 5 languages currently available for Vocaloid (including monothongs, diphthongs and rhotic vowels).

For replace a vowel, you need to have an idea of which are the closest vowels in terms of sound quality. For this reason, it's a good idea to know which is

Ipa vowel chart for Vocaloid

Vowel chart for English, showing the rough position of the monothongs along with the respective IPA's symbols (black) and symbols for VOCALOID (gray). The relative position/pronunciation of the vowels , may vary according the regional accent/dialect.

  • Open Vowels:
    ←unrounded [{] [V] [@] [Q@] [Q] rounded→
    • Also known as low vowels, they are pronounced with the mouth open and with tongue in low position in relation from the roof of the mouth. They are characterized by their 'ah' to 'uh' sound quality and they positioned in the bottom of the IPA vowel chart.
  • Front unrounded vowels:
    ←open (lax) [{] [e] [e@] [eI] [I] [i:] close (tense)→
    • Also known as bright vowels, they're placed at the left side of the chart and are pronounced with the tongue positioned as far in front possible and with the lips unrounded. It sounds tends to vary from an lax 'eh'-like sound toward a tenser 'ee'-like or y-like sound, as the mouth progressively closes.
  • Back rounded vowels:
    ←open (lax) [Q@] [Q] [O:] [O@] [@U] [U] [u:] close (tense)→
    • Also known as dark vowels, they're placed at the right side of the chart and are pronounced with the tongue positioned as far in front possible and with the lips rounded. It sounds tends to vary from an lax 'oh'-like sound toward a tenser 'oo'-like or w-like sound, as the mouth progressively closes.
  • Central vowels:
    ←unstressed [@] [@r] [V] stressed→
    • Located at the center of the chart, these vowels tends to have an undefined 'uh'-like sound. When a vowel is reduced, it may tend to shift toward a central vowel.

Knowing this is relatively easy known how to replace a phoneme.

Example: The vowel [e] may be replaced by a [{] in case it's needed a more open pronunciation, or a [I], if it's needed a more closed one

In some instance, some diphthongs and rhotic vowels may be used as replacement of the monothongs, if it's pronunciation is closer to a pure vowel. In the case of the diphthong, this is possible for the mid vowels [eI] and [@U] if their diphthongization isn't too marked:

Example: The phoneme [eI] tends to sound like a tense [e] in some dialects.

In the case of the R-colored vowels, if the pronunciation is non rhotic, these ones

Example: The phoneme [Q@] in non-rhotic pronunciation is /ɑ:/, which allows use it as replacement of other open vowels like [V], [Q] and [@].

{{/DiphoneR}}

Diphone Replacement/Splitting

[27][28]

Original Diphone Type IPA's notation Replacement for First Phoneme Replacement for Second Phoneme
[aI] Diphthong aɪ̯ [V], [{] or [Q] [e], [I], [i:] or [j]
[eI] Diphthong eɪ̯ [e] [I], [i:] or [j]
[OI] Diphthong ɔɪ̯ [Q] or [O:] [I], [i:] or [j]
[aU] Diphthong aʊ̯ [V], [{] or [Q] [O:], [U], [u:] or [w]
[@U] Diphthong oʊ̯ [Q] or [O:] [O:], [U], [u:] or [w]
[@r] Rhotic Vowel əɹ or ɚ

[@]

[r]
[Q@] Rhotic Vowel

ɑː(ɹ) (UK)


ɑɹ (US)

[V], [{] or [Q]

[@]


[@r] or [r]

[e@] Rhotic Vowel

ɜː (UK)


ɝ (US)

[e]

[@]


[@r] or [r]

[I@] Rhotic Vowel

ɪə (UK)


i(ə)ɹ (US)

[I] or [i:]

[@]


[@r] or [r]

[O@] Rhotic Vowel

ɔː(ɹ) (UK)


ɔɹ~oɹ (US)

[Q] or [O:]

[@]


[@r] or [r]

[U@] Rhotic Vowel

ʊə (UK)


ʊɹ (US)

[U] or [u:]

[@]


[@r] or [r]

[29]

See also

Conversion Lists
Interwiki articles

External links

References

Advertisement