VOCALOID (often referred to as just "V1" in VOCALOID communities) is a singing synthesizer application software developed by the YAMAHA Corporation. The project was an international effort, and is considered the brainchild of Kenmochi Hideki, also known as the "father" of VOCALOID.
In the 20th century, the most successful vocal synthesizing attempt had been "Queen of the Night" from Mozart's opera The Magic Flute; this had been made in 1984 by Yves Potard and Xavier Rodet using the CHANT synthesizer.
Jordi Bonada, a senior researcher at the Music Technology Group at Pompeu Fabra University in Barcelona joined the university in 1997. Bonada worked on a research project as requested by YAMAHA which contained some "interesting" ideas. Bonada was known to have set about recording not just a song from a singer, but various ranges and pitch in an attempt to build a model that any song could be built from. The project was codenamed "Elvis" and lasted two years. It did not become a product at the end of its development. This was due to the fact this particular project was too large due to being based on spectral morphing techniques and each song required a professional singer behind it.
While it did not become a product, the "Elvis Project" helped establish that a series of phonetics in a wide range of pitches would help build a synthesizer based on any model. YAMAHA agreed to help them start a fresh new project; it was at this point that Kenmochi Hideki joined. The first initial ideas came from him in Japan in 2000, with most of the research done at the Pompeu Fabra University and the development of the core signal processing libraries created in C++. YAMAHA itself was responsible for the product design and development of the actual product. It was pure collaborative research, and they did not think about selling at that time. At the time, synthesizers would take days to produce good quality results, but the vocal would always sound inhuman and obviously generated by a machine or computer. The price was expensive as well. This meant that while all other parts of the music production were by then fully able to be recreated in a DAW, to produce a good quality vocal performance meant hiring a human vocalist. So the aim of the project was to provide a fast, low cost way of getting uncanny human-like vocals to give producers full control of the production of music. They used "Elvis" as the base model for ideas and set about on how to tackle two problems;
- how to process and transform singer recordings so that it would result in a performance of a given song sounding as natural as possible and providing the feeling of a continuous flow
- how to process and transform the singer recordings so that it would result in a performance of a given song sounding as natural as possible and providing the feeling of a continuous flow
The VOCALOID™ project was originally codenamed "Daisy Project" ("DAISYプロジェクト" or "でいじぃぷろじぇくと"), a name taken from the song "Daisy Bell" and was at a prototype stage in March 2002. (EpR ) was developed as the first voice model and it allow the researchers to transform vocal timbres in a natural manner while preserving subtle detail. At first, "Daisy" could only say vowels like "ai (love)". Four months later, "Daisy"'s first real word was "asa" (morning).
Due to the limited vocals that could be offered by YAMAHA itself, they licensed the software out to 3rd party studios. Both an English and a Japanese version were developed along side each other. The first studio on board was Crypton Future Media who were contacted in May 2002, who were hired to find English studios to support an English version. Sadly, all their efforts amounted to mostly negative responses, and the only studio to enter development was Zero-G, who joined in the Autumn period of 2002, with one other company who had also joined in 2002 (this was later identified as PowerFX).
At the 6th anniversary of VOCALOID™, Hiroyuki Itoh noted that they received demos from Zero-G without warning of what appeared to be a male vocal singing. At the time, VOCALOID was still being developed overseas at this point. Since they came unexpectedly, they did not realize they were VOCALOID demos and thought the demos were a form of prank.
"Daisy" was demonstrated at the 6th anniversary of VOCALOID™, where a file called "Fly me to the Moon by Daisy" was played, the file was originally created for 7/16/2002 when Crypton were shown the first demonstration in their Sapporo office. "Daisy", still had troubles with consonants at the time.
"Daisy" dropped as a name as it had troubles with copyright and despite trying to change the name (such as translating it into Japanese), they could not register the name and kept coming up with copyright conflicts, even in Japan.
Kenmochi reported the name of the software was very hard at the time to decide and "Vocaloid" had fallen into 3rd place as a choice of name. The name "Vocaloid" was chosen 2 or 3 weeks before its announcement, after the 2nd choice name failed due to a copyright conflict with a software in Belgium, "Vocaloid" being a portmanteau of the words "Vocal" and "Android". Kenmochi choose to announce the technology on 2/26/2003, a day before his birthday.
The original design of VOCALOID™ was to act as a replacement singer for a real singer. Many reviewers at the time of LE♂N and L♀LA's release thought that "VOCALOID" was a bold effort, as human speech was a complex thing to recreate. VOCALOID was regarded as the first of its kind to tackle singing vocals.
KAITO and MEIKO were originally recorded by YAMAHA themselves, before being made for commercial release. Kaito ended up being delayed a year and a half. 
The first VOCALOIDs, LE♂N and L♀LA, made their debut appearance and initial release at the NAMM Show on January 15, 2004. LE♂N and L♀LA were then released in Japan by the studio Zero-G on March 3, 2004, both of which were sold as a "Virtual Soul Vocalist". They were also demonstrated at the Zero-G Limited booth during Wired Nextfest and won the 2005 Electronic Musician Editor's Choice Award. Zero-G later released MIRIAM, with her voice provided by Miriam Stockley, in July 2004. Later that year, Crypton Future Media, Inc. also handled the release of the first Japanese VOCALOID, MEIKO. It was during this time period between MIRIAM and MEIKO's respective releases that the first rival software Cantor was released and aimed to compete with VOCALOID, known only in the western hemisphere by LE♂N, L♀LA, and MIRIAM.
Later Game Audio Network Guild held the "2nd Annual G.A.N.G. Awards Show" on Thursday, March 25, 2004 at the Fairmont Hotel in San Jose, California, during the Game Developer's Conference 2004. The software won the "Best New Audio Technology" award in Industry & Trade category.
Though LE♂N, L♀LA, MIRIAM, and MEIKO experienced good sales, MEIKO gaining sales of 3,000 in her first year in particular, KAITO initially failed commercially and sold just 500 units. Despite this, the software was overall successful and was followed by the VOCALOID2 engine.
It is is notable that back in 2004, VOCALOID was released towards the end of the "FLASH golden age" (FLASH黄金時代), a period known for the rise of flash-based productions (1998-2002/2005, end date arguable) and the birth of file sharing sites such as Youtube.
At the closing of the VOCALOID era, it was confirmed that 3 groups had joined production of the software. These companies were: Crypton Future Media, Zero-G Ltd and PowerFX. However, PowerFX, having been introduced to the software via LE♂N and L♀LA's demonstrations at the 2004 NAMM Show, did not produce any vocals for this version for VOCALOID, making their entrance at the beginning of the VOCALOID 2 era. However, it is known they had a Vocal in development as early as 2003 that was intended for the engine.
KAITO was sold with the 1.1 version for the software, but caused problems with other versions of the software and a patch had to be created to fix this issue. The last version of this software produced was 1.1.2, the patch to upgrade all VOCALOID voicebanks was released by YAMAHA themselves, although Crypton Future Media later updated both their products to the latest version. Due to the retirement of support for the VOCALOID engine, the update is no longer able to be downloaded, as of 2011, from YAMAHA.
Improvements were made between version 1.0 and version 1.1.2. Vocal phonetics in VOCALOID version 1.0 were more broken and did not attempt to smooth out phonetics like 1.1.2., resulting in more robotic vocal singing. However, even the slightest of adjustments in version 1.1.2. would produce very different results to version 1.0. Therefore, not all users found it suitable to update to version 1.1.2. from version 1.0, despite the improvements.
Due to the successes of the VOCALOID2 software, VOCALOID saw a second life in 2008 caused by KAITO's sudden growth in popularity. KAITO later went on to claim second best seller of the year in Nico Nico Market in 2008.
As interest in VOCALOIDs grew, Zero-G began reselling their VOCALOID products again on their website, and were considering updating their box art to match current VOCALOID trends better. However, this did not occur.
The engine is now unsupported as of 2011 by YAMAHA and from early 2014 onwards, the engine version was removed from sale.
In mid-December 2013 news came from both Crypton Future Media and Zero-G that their VOCALOID project were being taken down.
Zero-G gave the 31st December 2013 as their VOCALOID final retirement date, after this date they were removed from sale permanently.
Serials could still be purchased while they lasted, but general sale ended.
- Windows XP or Windows 2000 (NOTE: THE ENGINE ISN'T OFFICIALLY COMPATIBLE WITH WINDOWS 7 and WINDOWS 8)
- Pentium III, 1 GHz or faster
- 512MB of RAM or more
- Approx 700 Mb Hard disk space or more
- CD-ROM or DVD-ROM Drive
- SVGA Display (1024x768)
- Sound Card with Microsoft DirectSound Compatible driver
- LAN/network card must be installed, or a USB network card must be connected to the USB port
Vocal libraries released for the first VOCALOID engine.
Examples of usageEdit
An example of solfège using VOCALOID technology.
VOCALOID has 5 voicebanks available (3 English, 2 Japanese), offering a limited range of voices. Other genres are possible to achieve by users with further voice editing. Both English and Japanese VOCALOID have an English interface. Other languages were planned for the future (though these would not be introduced until VOCALOID3).
According to the original YAMAHA VOCALOID website, the software's key features were its ability recreate singing results exactly how you type them out on your PC. Manipulation of the vocals allowed for a greater array of styles and vocals than what was offered while having the added bonus of maintaining a degree of realism. VOCALOID drew its base for vocal based off analytic of the human voice and less from the samples of the human vocal. Extra expressions could be installed into a voice simply by adding vocal effects to further achieve results.
The file format for VOCALOID is "VOCALOID MIDI" (.MIDI), VOCALOID will not import .VSQ or .VSQX files, although it will import most midi file types.
The database of VOCALOID is much simpler and more difficult to modulate consonant sounds than the VOCALOID2 engine that followed. However, VOCALOID has some functions that VOCALOID2 does not have, such as the Resonance parameters. Resonance allowed the phonetic data to be manipulated through formant modulation, making it sound differently depending on what was done to it. The biggest advantage this offered was flexibility. As seen with voicebanks like LE♂N or MEIKO, each user can utilize the voicebanks very differently and VOCALOID has produced a wider range of different results with delicate editing by using several Resonances or other functions. All VOCALOID vocals are known to have had a small, be it undeclared, optimum vocal range compared to most vocals powered by later engine versions.
Unlike the version that followed, VOCALOID was a analytic based system that worked out how to adapt the vocal using mathematics. In short, this meant it used record data of samples to make the engine sound more like the vocalist behind the data, as a result the overtone of all 5 vocals was identical. The vocals sounded very synthetic and LQ, yet this is also why the engine was able to have such great flexibility opposed to the sample-based versions that followed VOCALOID. The quality issue limited the feasibility of vocals being released for it and JODIE and RONIE were not released for this reason. Also while realism was not beyond it, the analytic based results did not produce as realistic results as the sample based system.
When DSE.dll or DSE1_1.dll is examined by Hex editor software, a number of listed phonetics were stated by the engine as possible sounds; however, no released VOCALOID used them.
The VOCALOID interface also had minor adjustments depending on what VOCALOID was used to open the engine with. For example, MIRIAM's interface recoloured the keyboard around the keys deep blue with Zero-G's logos on the interface while KAITO's was green with Crypton Future Media logos. The standard that was used in VOCALOID demos and presentations was brown with no logos whatsoever.
All VOCALOID voicebanks except KAITO used the VOCALOID 1.0 editor when they were released. Users using the VOCALOID 1.0 editor can update them by patching VOCALOID 1.1 update file. KAITO already was released with both kinds of VOCALOID editors. However, users who are not using 1.1.2 version need to patch VOCALOID Ver1.1.2 update file distributed on Crypton's official page first before they use VOCALOID 1.0 editor. There are many differences between ver1.0 and 1.1, and they sound differently even if they are edited in the same way. (Comparing KAITO's ver 1.0 and ver 1.1 Niconico broadcast) The main difference between them is singing style and portamento Timing.
Though users can switch between versions, its best to proceed with caution when doing so.
Despite being Japanese, KAITO and MEIKO did not have a Japanese interface as this version was never fully translated into Japanese, although the phonetics were still Japanese. Another issue with VOCALOID is that it had a number of synchronizing issues, which varied between VOCALOID voicebank libraries; this crated problems when setting the result to music.
In comparison to their providers (based on samples known for L♀LA, MIRIAM, KAITO, and MEIKO's vocal providers) VOCALOID voicebanks are more deeper sounding in tone than their vocalist's own vocals are more softer, often huskier.
In addition, VOCALOID vocals of both languages are missing some sounds that are needed to perfect either language. In other cases, the pronunciations exist but do not correctly sound out the right combination as expected, due to lack of distinction between similar sounds. However, the majority of the correct sounds exist and with some tweaking results can be made to sound closer to the intended results. The VOCALOID synthesizing engine will often attempt to improvise some sounds, however, the results are often crude and at times rough. For example, when the engine encounters slurring (a long term issue of the VOCALOID software caused by a sample handling issues), clarity is almost completely lost and it is difficult to maintain clear results without much work. The rough handling of the VOCALOID engine in its attempt to perfect language while sounding human and control the flow of lyrics across the different keys is the origin of much of the heavier digital results of the 5 VOCALOID vocals. VOCALOID is also more likely to skip sounds than later versions when encountering problems.
VOCALOID may have issues with the Windows 7 operating system (though there are successful cases of installation) and while VOCALOID is supposed to be compatible with Windows Vista and users have reported no major problems, initially, rumors stated otherwise. However, it cannot be guaranteed that VOCALOID will work with operating systems newer than Windows XP. For Windows 7 and 64-bit OS, those who have managed a successful installation report that VOCALOID will often encounters issues that cause it to constantly crash. While it is usable, it is not always stable.
Illegal versions of the software were also commonplace for VOCALOID. The software was easy to crack by pirating teams and every voicebank was cracked at some point after release. It was also discovered that most popular keygens worked with it. There was very little service differences between the legal and illegal versions aside from a lack of technical support from studios, although the software ReWire function may not work as well as the legal version.
VOCALOIDs were promoted at events such as the NAMM show. It was the promotion of Zero-G's L♀LA and LE♂N at the NAMM trade show that would later introduce PowerFX to the VOCALOID program. Most of the promotions were done through magazines such as Sound on Sound and the New York Times newspaper. While Japanese VOCALOIDs were also promoted, their promotion was much lighter than what would follow in the VOCALOID2 era, and MEIKO and KAITO were promoted in the same manner. The media was not used as a method of promotion and they overall had less attention then the English version.
The two biggest failures of both studio's marketing ploys was Zero-G's failure to sell in America,despite the high level of attention given to this version by the media, as well as KAITO's initial lack of sales. Otherwise, both Crypton and Zero-G managed to meet expectations of their VOCALOIDs during the VOCALOID engine era, with Meiko fairing the best of all 5 vocals, selling x3 the amount that was expected for her to sell.
After the success of Hatsune Miku in the VOCALOID2 era and sudden interest in KAITO in 2008, Crypton Future Media were able to go back and re-sell their early VOCALOID voicebanks, using the same methods of approach to them as their VOCALOID2 voicebanks. This proved successful enough for them to re-launch their VOCALOIDs for a later engine. Zero-G's attempt to do the same was not as successful, since the approach to English VOCALOIDs and Japanese VOCALOIDs had varied greatly over the last few years. However, Zero-G had established that if the demand ever becomes high enough, they will relaunch their 3 VOCALOID voicebanks in a later engine.
The VOCALOID software was not well supported and there was little information on it. Crypton Future Media did however go back and make tutorials for this version of the software in August 2008.
In comparison to its successor, VOCALOID had very little cultural impact at its time of release. Sales of the software were very sluggish.
It is difficult to know how many songs and albums are using the VOCALOID software since song writers must ask permission before being allowed to state specifically they are using a VOCALOID in their songs. The first album to be released using a VOCALOID was A Place in the Sun, which used LE♂N's voice for the vocals singing in both Russian and English. MIRIAM has also been featured in two albums, Light + Shade and Continua. Japanese electropop-artist Susumu Hirasawa used VOCALOID L♀LA in the original soundtrack of Paprika by Satoshi Kon.
The CEO of Crypton Future Media, Inc. noted the lack of interest in the initial VOCALOID software. Many studios when approached by Crypton Future Media for recommendations had no interest in the software initially, with one particular company representative calling it a "toy". Crypton blamed a fear of robots on part of the lack of response on the sale of the software. A level of failure was also put on LE♂N and L♀LA for lack of sales in America, putting the blame on their British accents, despite initial praises overall from reviewers of the software, and the fact that the English version software had sold well in both Japan and Europe.
Earlier VOCALOIDs were created without "avatars", and boxart was not important to the function of the program. While MEIKO and KAITO had images that could later be used as avatars, LE♂N, L♀LA and MIRIAM (although there is a clear image of a person) did not. When avatars became common with Japanese VOCALOIDs during the VOCALOID2 era, the English VOCALOIDs without official avatars were left to interpretation by fan artwork. Zero-G did show interest in revising the boxart of their VOCALOIDs since interest in VOCALOIDs had greatly increased, but the voicebanks were retired before this occured.
VOCALOID voicebanks were criticized for their poor pronunciation problems and both versions of the software suffered issues with certain sounds. However, despite the lack of interest, most reviews on them were good. Although criticism was in plenty, praise was equally found, as many recognized that VOCALOID™ was an ambitious project to undertake, being more complex and bolder than a synthesizer or an instrument like the flute or guitar. Since the human ear can pick up errors in speech, this made VOCALOID a difficult product to sell, yet VOCALOID was able to sound realistic enough on occasion. This was very important to consider as at the time of release, as stated by "Popular Science", "Synthetic vocals have never even come close to fooling the ear, and outside of certain Kraftwerk chestnuts, robo-crooning is offputting." YAMAHA received much praise, the VOCALOID project was hailed as a "quantum leap" on vocal synthesis, while VOCALOID itself received much attention and praise within the industry.
Crypton Future Media stated that the VOCALOID engine was more like a prototype engine for the later VOCALOID2 software that followed. There was also some criticism for opening the engine up as commercial product rather then limiting the license to just private or business level of usage, although Crypton Future Media thought this was best for the software.
- Official website
- Crypton's official VOCALOID2 website
- PowerFX official website
- Jasmine Music Technology Tips for using YAMAHA VOCALOID
|This page uses Creative Commons Licensed content from Wikipedia (view authors).|