Lending voice to an international initiative

January 20, 2016 - Tara Sharpe

UVic linguistics associate professor Sonya Bird was listening to the CBC Radio “Spark” program in early 2014 when she first heard the voice of Rupal Patel, a Canadian researcher based in Boston who launched the VocaliD Human Voicebank in May 2014.

The interview ignited Bird’s immediate interest in contributing to Patel’s mission to connect those living with a severe speech disorder (or limited speech) to their own unique vocal identities. Now, as a result of an eight-month volunteer effort by the Voice Drive Victoria group in collaboration with local speech language pathologist Gail Poole, more than 100 people in our region have donated their voices to this global effort.

Synthesizing speech

Theoretical physicist Stephen Hawking, arguably the most easily recognizable user of synthesized speech, is British but his familiar voice carries an American ac#162; created in the early 1980s, it is dubbed “Perfect Paul” and is one of very few voices currently in use. The VocaliD project aims to match recipients with voices that can be more authentically their own.

This is all possible thanks to the elegant application of speech science, but the donor process first involves a short survey about a person's linguistic background, the particular sound of their voice (for instance, is it loud, soft, “twangy” or child-like) and even their height which in turn indicates the size of the “voicebox” or vocal tract.

Each donor reads and repeats nearly 3,500 sentences, some of which are popular fairy tales or well-known stories but jumbled up like scattered pieces of a jigsaw puzz#8804; this is so that the donor doesn’t fall into a storytelling rhythm. “It was actually more stressful to me than donating blood,” adds Bird, who also donated to the drive. “You feel more vulnerable.”

The sentences are then broken up into sound combinations. Using technical engineering, those combinations are blended and synthesized into distinct personalized voices to neatly fit individual recipient profiles. VocaliD is beginning with English speakers but its model could in future extend to other languages.

Given the natural variability of pronunciation (Bird explains that a person won’t pronounce a word—even “hello”—exactly the same way twice; “we’re not robots”), the volume of possible voice choices could seem as plentiful as the number of humans on earth.

Bird says she will personally find it “very moving to hear from people whose voices finally match them now.” She suggests there shouldn’t be any “voice doppelgangers” either. The higher the quality of the audio equipment and acoustical environment, the more consistent the donor voice will be and the resulting synthesized voices will be even clearer and more natural. The need for stellar equipment is where the UVic labs came in.

UVic volunteers and labs play major role

Both funded by the Canada Foundation for Innovation, the Phonetics Lab with its sound-treated recording room and latest hardware and software for studio-level recording and the Speech Research Lab (SRL), with its sophisticated sound booth and post-production workstations, played a significant role in the effort.

So did the 19 UVic students, including fourth-year student Kelly Regan who helped Poole and Bird spearhead the Victoria effort. More than 100 voice donations are now complete, which included approximately seven hours for each donor and 800 volunteer hours.

“Most of our donors are retired, many of them had British accents [but there was also a mix of accents], and they seemed very excited about the technology,” recalls Regan. “We helped them a lot with the software.” She too donated her voice.

The Humanities Computing and Media Centre at UVic assisted with technological troubleshooting. “Because ours was the first big unified drive, you could say we were the ‘test case’ for VocaliD,” says Bird. “Our group identified a number of glitches, such as compatibility with browsers,” now resolved by the VocaliD technical support team for others involved in the crowd-sourced global project.

The Department of Linguistics currently houses three labs—the Phonetics Lab, the SRL and the Sociolinguistics Lab. These labs contribute in various ways to teaching and researching the phonetic details of speech, from the physiological properties underlying speech to the acoustic signal that we hear and make sense of.