Vocaloid Wiki

Phonology and Phonetics for Vocaloid users/Consonants

< Phonology and Phonetics for Vocaloid users

8,188pages on
this wiki
Add New Page
Comment1 Share

A consonant is a speech sound characterized by the constriction or closure at one or more points along the vocal track. Its name comes from the latin and means "souding together" or "sounding with". This comes from the old definition where the consonants were the sounds that accompanied the vowels, and those ones couldn't exist in the syllable without a vowel. The modern linguistic however this definition doesn't longer apply.

The consonant usually conforms the border or margin of a syllable, contrary to the vowels that usually conforms the peak or nucleus of the syllable. Despite this, this doesn't limit that can exist some syllabic consonants, or consonant with a phonetic behavior closer to a vowel than a consonant, like is the case of the approximants (specially the glides).


Stricture DegreeEdit

The stricture defines the degree of constriction and how narrow the gap is between the active articulator and the passive articulator. It's possible identify 3 major degrees of Stricture [1][2]

  1. Complete closure
    • Stops: Stop stricture is complete closure followed by release. In the case of the orals stops (plosives), this produces a brief air burst or plosion. In the case of the Nasals stops, the air is redirected and escapes through the nose.
    • Flaps/Taps: Like in the Stop stricture, the flap produces complete closure of the track. However this one really short in comparison.
    • Trills: A trill consists of a series of taps interspersed by narrow openings of a similar cross-sectional area to a fricative. This gives particular vibratory propierties.
  2. Close approximation
    • Fricatives: Fricative stricture consists of a very narrow opening, not the enough to stop the airflow but the enough for block partially it and produce turbulence if this one. Turbulent air flow generates random or aperiodic sound that characterises fricatives.
  3. Open approximation
    • Approximants: Approximant stricture consists of a opening with a greater cross-sectional area than a fricative but the opening is narrower than that of a vowel. The opening is so big that it can't produce blocking and turbulence of the airflow.
    • Resonants: This is the stricture of the vowel and semivowels. There's no narrowing nor blocking of the airflow.

Voicing, Manner of Articulation, Place of ArticulationEdit

The consonant can be classified by a 3 main features. Those ones are:

  • Manner of articulation, which refers how is the constriction in the vocal track and how the air escapes from this one.
  • Place of articulation, which refers where in the vocal tract the obstruction of the consonant occurs, and which speech organs are involved.
  • Voicing or Phonation, which refers if the sound is produced with or without vibration of the chord.


The phonations refers to the

Manner of ArticulationEdit

This one refers to how air escapes from the vocal tract when the consonant and the degree of airflow's obstruction. This obstruction can be total (which causes a burst or plosion of the airflow when the obstructions ceases), partial (which causes turbulence or frication) or almost null (in the case of the approximants, which includes to the the glides).

Knowing this is possible establish two major, the obstruent, which corresponds to the consonant that are produced with obstruction (total or partial) of the airflow, and the sonorants that are produced null obstruction.


Obstruents is a kind of consonant sound produced by obtruction total or partial of the airflow in the vocal track.

The obstruent can be voiced or unvoiced, this mean they can produced with vibration of the vocal chords or not, respectively, although these ones tends to be voiceless. The voiced obstruents due the vibration of the vocal chords, have a characteristic buzzing when are compared agains their voiceless counterparts (Examples: [d] against [t] or [b] agains [p]).

This groups of consonants includes the plosives (also known as stops or occlusives due the total stop of the airflow), the affricates (with a combinated articulation of the plosives and fricatives) and the fricatives (characterized by the partial obstruction of the airflow). Inside the affricates and fricatives it's possible find the groups of the sibilants consonants.

Also known as oral stops or oclussives, are kind of consonants are produced with obstruction total of the airflow. This causes an "bursting" release of the airflow known as plosion. This make them "harsher" than their respective affricates or fricatives.

IPA's symbol / name Vocaloid's Symbol Place of articulation


t voiceless alveolar plosive [t] (EN, JP) alveolar voiceless
d voiced alveolar plosive [d] (EN, JP) alveolar voiced
voiceless dental plosive [t] (SP) dental voiceless
voiced dental plosive [d] (SP) dental voiced
p voiceless bilabial plosive [p] (EN, JP, SP) bilabial voiceless
b voiced bilabial plosive [b] (EN, JP, SP) bilabial voiced
k voiceless velar plosive [k] (EN, JP, SP) velar voiceless
g voiced velar plosive [g] (EN, JP, SP) velar voiced

The affricates are consonants that start as a plosive (total obstruction of the airflow) but has a release as a fricative (turbulent release). For this, they can be considered a intermediate speech sound between those both. The typical example in various languages is the CH, which usually corresponds to the [tʃ] in the various languages.

IPA's symbol / name Vocaloid's Symbol Place of articulation


ʦ voiceless alveolar affricate [ts] (JP) alveolar voiceless
ʣ voiced alveolar affricate [dz] (JP) alveolar voiced
ʧ voiceless palato-alveolar affricate [tS] (EN, SP) palato-alveolar voiceless
ʤ voiced palato-alveolar affricate [dZ] (EN) palato-alveolar voiced
ʨ voiceless alveolo-palatal affricate [tS] (JP) alveolo-palatal voiceless
ʥ voiced alveolo-palatal affricate [dZ] (JP) alveolo-palatal voiced

Fricatives are the group of consonants produced by forcing air through a narrow channel made by placing two articulators (generally the tongue and another one) close together. This produce a partial block which generates a turbulent airflow knows as frication .

The fricatives have a softer pronunciation in comparison with the corresponding plosive, but have a stronger comparison to the respective approximant.

IPA's symbol / name Vocaloid's Symbol Place of Articulation Voicing
ɸ voiceless bilabial fricative [p\] (JP) bilabial voiceless
β voiced bilabial fricative

[B] (SP1 )

bilabial voiced
f voiceless labiodental fricative [f] (EN, SP, KR) labiodental voiceless
v voiced labiodental fricative [v] (EN, KR) labiodental voiced
θ voiceless dental fricative [T] (EN, SP) dental voiceless
ð voiced dental fricative

[D] (EN, SP2 )

dental voiced
s voiceless alveolar sibilant

[s] (EN , JP , SP3 )

alveolar voiceless
z voiced alveolar sibilant [z] (EN) alveolar voiced
ʃ voiceless palato-alveolar fricative [S] (EN) palato-alveolar voiceless
ʒ voiced palato-alveolar fricative [Z] (EN) palato-alveolar voiced
ɕ voiceless alveolo-palatal fricatice [S] (JP) alveolo-palatal voiceless
ʑ voiced alveolo-palatal fricative [Z] (JP) alveolo-palatal voiced
ç voiceless alveolar fricative [C] (JP) palatal voiceless
ʝ voiced palatal fricative [j\] (SP) palatal voiced
x voiceless velar fricative [x] (SP) velar voiceless
ɣ voiced velar fricative

[G] (SP4 )

velar voiced
h voiceless glottal fricative [h] (EN, JP) glottal voiceless

1^ Actually the phoneme is a a bilabial approximant /β̞/, which has a different pronunciation.
The voiced bilabial approximant /β/ has a harsher pronunciation, more similar to an /v/ voiced labiodental fricative. In fact this phoneme is quite inestable and tends to shift to this one.

2^ Actually the phoneme is a a bilabial approximant /β̞/, which has a different pronunciation.
The voiced bilabial approximant /β/ has a harsher pronunciation, more similar to an /v/ voiced labiodental fricative. In fact this phoneme is quite inestable and tends to shift to this last one.

3^ The Castillian Spanish S actually is a voiceless apicoalveolar fricative or retracted S. It is said to have a "whistling" quality, and to sound similar to palato-alveolar [ʃ].
In the Latin America's region is the common voiceless alveolar fricative.

4^ The phoneme actually is a velar spirant approximant /ɣ˕/. This one mustn't be confused with the velar semivowel approximant /ɰ/ found in the Korean Language, the one of the Spanish Language has a pronunciation closer to the velar fricative than the velar semivowel approximant.

A sibilant is a manner of articulation of some fricatives and affricates consonants characterized for be made by directing a stream of air with the tongue towards the sharp edge of the teeth, which are held close together. The sibilant usually are related to the letters S, Z & C; and are characterized by their intense sound (for this usually their are used for call the attention, like when you are quieting someone using "shhhh!")

In relation as the perceived sound, the sibilants can classified in two categories: the hissing sibilants and the hushing sibilant. The hissing sibilants are related with the phonemes s, z and their variation. In the case of the hushing sibilants, most of them are Post-Alveolar Consonants, with different degree of patalization, and where the main different is the shape of the tongue when they're articulated. The voiceless hissing and hushing sibilant (mainly the first ones) has a soothing sound and isn't weird find them in words with a calming meaning or connotation in several languages.

IPA's symbol / name Vocaloid's Symbol Consonant type Voicing Classification Patalization
s voiceless alveolar sibilant

[s] (EN, JP, SP1 )

fricative voiceless hissing none
z voiced alveolar sibilant [z] (EN) fricative voiced hissing none
ts voiceless alveolar affricate [ts] (JP) affricate voiceless hissing none
dz voiced alveolar affricate [dz] (JP) affricate voiced hissing none
ʃ voiceless palato-alveolar fricative [S] (EN) fricative, palato-alveolar voiceless hushing slight
ʒ voiced palato-alveolar fricative [Z] (EN)

affricative, palato-alveolar

voiced hushing slight
voiceless palato-alveolar affricate [tS] (EN, SP) affricate, postalveolar voiceless hushing slight
voiced palato-alveolar affricate

[dZ] (EN)

affricate, postalveolar voiced hushing slight
ɕ voiceless alveolo-palatal fricatice [S] (JP) fricative, postalveolar voiceless hushing strong
ʑ voiced alveolo-palatal fricative [Z] (JP) fricative. postalveolar voiced hushing strong
voiceless alveolo-palatal affricate [tS] (JP) affricate, postalveolar voiceless hushing strong
voiced alveolo-palatal affricate [dZ] (JP) affricate, postalveolar voiced hushing strong

1^ The Castillian Spanish S actually is a voiceless apicoalveolar fricative or retracted S. It is said to have a "whistling" quality, and to sound similar to palato-alveolar [ʃ].
In the Latin America's region is the common voiceless alveolar fricative.


A Sonorants is the speech sounds produced without turbulence or obstruction of the airflow. The group is diverse including to the vowel, semivowels, approximants, liquids (rhotics and laterals) and nasals. Although the definitons varies per author or source, they share a series of traits as could act as syllable nucleous or be modally voiced (rarely are unvoiced).

A feature in Vocaloid3 is the addition of devoiced variants of the sonorants. Those ones are characterized by the addition of the suffix _0 the phoneme, which correspond to the X-SAMPA representation for the voiceless diacritic. Due the sonorants group is diverse, those ones are different for each language available for Vocaloid.


A nasal consonant is a consonant where the airflow is directed through the nose. The term is generally used for refer to the nasal stops, the most common kind of nasal consonant and the only one found in the different languages available for Vocaloid.

The nasal stops are know for its strong tendency to assimilation processes. They're known for assimilates the place of articulation of the following consonant, due this is quite common found various allophones for the nasal consonants in the most of the languages. Similarly they can cause assimalation of the preceding vowels inducing the nasalization of those ones.

IPA's symbol / name Vocaloid's Symbol Place of Articulation
n alveolar nasal

[n] (EN, JP, SP, KR1 )

[np] (KR)


bilabial nasal

[m] (EN, JP, SP, KR1 )

[mp] (KR2 )


palatal nasal

[J] (JP, SP) palatal

velar nasal

[N] (EN, JP, KR1 )

[Np] (KR2 )


uvular nasal

[N\] (JP) uvular

1^ Initial consonant variant in the Korean Phonetic System
2^ Final consonant variant in the Korean Phonetic System


Approximants are speech sounds that involve the articulators approaching each other but not narrowly enough or with enough articulatory precision to create turbulent airflow. Therefore, approximants fall between fricatives, which do produce a turbulent airstream, and vowels, which produce no turbulence. This class of sounds is varied and includes lateral approximants (L-related, see liquid consonants section further ahead), non-lateral approximants, and the semivowels or glides.

To distingish the case of the different sonorous quality between the velar approximant of the Korean and one of the Spanish, the article will utilize the classification proposed by Eugenio Martínez-Celdrán.[3]

IPA's symbol / name Vocaloid's Symbol Place of Articulation Classification
j palatal semivowel approximant

j (EN, JP, SP)

palatal semivowel
w labialized velar semivowel approximant w (EN, SP) velar labialized semivowel
ɰᵝ or w̜ compressed labiovelar approximant w (JP) labio-velar semivowel
β̞ bilabial spirant approximant B (SP) bilabial spirant
ð̞ dental spirant approximant D (SP) dental spirant
ɣ˕ velar spirant approximant G (SP) velar spirant
ɹ alveolar approximant r (EN) alveolar rhotic
l lateral alveolar approximant

l (SP, KR)

l0 (EN)

alveolar latetal
ɫ or lˠ velarized lateral alveolar approximant l (EN) alveolar velarized lateral
λ lateral palatal approximant L (SP) palatal lateral
Semivowel ApproximantsEdit

The semivowels or glides are a kind of consonants that has phonetic behavior of a vowel, but acts as syllable boundary rather than as the nucleus of a syllable. In simple terms those ones sounds as a vowel but behaves as consonant.

Some linguists prefers call them semi-consonants to difference them of the non-syllabic vowels (which also are semivowels and are important elements of the diphthong), while other linguist consider both as the same. The distintion isn't clear and are subject mainly to the gramatical rules of each language.

Deepen further the relation between the vowels and the semivowels, occurs that each semivowel has its respective vowel counterpart. Both having practically the same sound and where the first one can be considered as the non-syllabalic counterpart of the last one.

Semivowel (non-syllabic) Vowel (syllabic)
j palatal approximant i close front unrounded vowel
ɥ labio-palatal approximant y close front rounded vowel
ɰ velar approximant ɯ close back unrounded vowel
w labiovelar approximant u close back rounded vowel
Spirant ApproximantsEdit

The spirants approximant is . Like any sonorant, they don't produce obstruction or turbulence of the airflow, being similar to the vowels. However in terms of sound, they are closer to the fricatives.

Like the semivowels, each spirant approximant can be related to a respective fricative. In some language the stability of some fricatives is low, and shifts to other articulations.

Spirant Approximant Fricative
β̞ bilabial spirant approximant β voiced bilabial fricative
ð̞ dental spirant approximant ð voiced dental fricative
ɣ˕ velar spirant approximant ɣ voiced velar fricative

Liquid ConsonantsEdit

The liquids are a kind of consonants which groups the lateral and rhotics consonants. Both kind of consonants shares a series of characteristics like: they often have the greatest freedom in occurring in consonant clusters, and they can be prolonged (or shortened) in the same manner as a vowel, and even having the possibility of act as syllable nucleus like the nasals. Their name comes from often be referred to have a "fluid" sound.

In the European Languages usually the are 2 liquid consonts, one lateral (usually related to the L) and one rhotic (usually related to the R), while in general the Asian countries only have one liquid with little distinction between the laterals and the rothics.


A lateral consonant is an el-like consonant, in which airstream proceeds along the sides of the tongue, but is blocked by the tongue from going through the middle of the mouth. Associated to the letter L, the laterals included to taps, approximants, fricatives, affricates and clicks, the two first are the most common in the vocaloid phonetic system.

IPA's symbol / name Vocaloid's Symbol Consonant type
l alveolar lateral approximant

[l0] (EN )

[l] (KR , SP )

ɫ velarized lateral approximant [l] (EN ) approximant
ɺ alveolar lateral flap

[4] (JP1 )

ʎ palatal lateral approximant [L] (SP ) approximant

1^ The Japanese R is an undefined liquid consonant. It can be a /ɾ/ (rhotic), although usually varies between a /ɽ/ and a /ɺ/ (both lateral consonants).


The rhotics, tremulants or R-like sounds, are a group of liquid consonants, they're associated to the letter R and the greek symbol rho (hence the name). The rhotics has little association phonetically talking (the kind of consonants is diverse, with little articulatory relation between them). Instead the rothics seems to have similar phonological funtions and share some phonological features (like the lowered third formant) across the different languages.

IPA's symbol / name Vocaloid's Symbol Consonant type
ɹ̠ alveolar approximant

[r] (EN1 )

ɾ alveolar tap

[r] (KR , SP )

[4] (JP2 )

r alveolar trill

[R] (EN )

[rr] (SP)


1^ The English R as alveolar approximant often is retracted and labialized /ɹ̠ˤʷ/. Also it can be labialized retroflex flap or /ɻʷ/.
2^ The Japanese is an undefined liquid consonant. It can be a /ɾ/ (rhotic), although usually varies between a /ɽ/ and a /ɺ/ (both lateral consonants).

Beside the rhotic consonants is possible found rothic vowels. These vowels are characterized for have certain R-like tone (produced by the low frecuency in their third formant) and are represented as diphones in the Vocaloid's English Phonetic System. It's important stand out the R-colored vowels may differ stongly between the different voicebanks, being bind to the differences between the rhotic and non-rhotic accents.



Please note we are waiting for more information on some languages


Ad blocker interference detected!

Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.