Posts Tagged ‘text to speech’

ASR-as-a-Service

Saturday, May 21st, 2011

Automated Speech Recognition (ASR) as-a-Service can be powered via Tropo and SIP using this simple open source code provided below.  This service is perfect for adding speech recognition to your existing Asterisk, FreeSwitch, YATE, or enterprise app from the stone ages.

Here’s how it works!

You can transfer a phone call via SIP from a platform that doesn’t support ASR to Tropo along with the following SIP headers: prompt, choices, and returnaddress (SIP address). Tropo automatically answers the call and prompts the user the the text-to-speech (TTS) prompt passed. It automatically loads the ASR grammar with the choices passed. Upon successfully processing the speech recognition, Tropo transfers the call and the recognized result back to the return SIP adrress along with an x-voxeo-result SIP header containing your keyword spoken by the user.

Here’s the Tropo code using our hosted Scripting API:

This code is written in Ruby using our Tropo Scripting API.  You can use $currentCall.getHeader to get the SIP headers passed to your Tropo application and you can send headers to other SIP applications using the Tropo transfer method.  The speech recognition magic happens in the Ask method.

recoresult = ask $currentCall.getHeader("x-sbc-prompt"), {
   :choices => $currentCall.getHeader("x-sbc-choices")}

transfer $currentCall.getHeader("x-sbc-returnaddress"), {
    :headers => {'x-voxeo-result' => recoresult.value}
    }

Here’s how we tested it.

We used Phono, our browser-based webphone, to call Tropo and pass the prompt, choices, and returnaddress parameters.

phono.phone.dial("sip:9996106030@sip.tropo.com", {
  	headers: [
	{
		name:"returnaddress",
		value: "sip:9996106032@sip.tropo.com" // you could use returnaddress var to send the results back to Phono's SIP address
	},
	{
		name:"prompt",
		value: "What is your favorite color?"
	},
	{
		name:"choices",
		value: "blue,green,red,yellow"
	}
	],

Just to prove that yet another SIP application could receive the speech recognition results, we created another Tropo application in Ruby using the Scripting API to simple say the results.

say "You said " + $currentCall.getHeader("x-voxeo-result")

This second test application would typically be omitted for a real application since the returnaddress would most likely be the originating SIP address of your switch to return to your original callflow.  It’s cool that you could transfer to yet another application for additional processing!

What’s Next?

You can clone or fork this open source project on Github and use it today for as little as $.03 per minute for the Tropo call.  Let us know if you would prefer for us to build this service out for commercial use.

 

Simple tips for better text to speech

Thursday, January 27th, 2011

Are you looking for a way for your text to speech to sound more natural? Here’s a couple of quick tips.

First, make sure you’re using the right voice. Trying to pronounce Spanish words with an English or German text to speech voice will give poor results. The text to speech engine will phonetically say words it doesn’t know, leading to some odd-sounding vocalizations if you use the wrong language’s voice. See the voice parameter on the say() function for details on how to set the right language. You can also set the voice on ask() and record().

For English speakers, try using a different accent. Tropo provides both US and UK accents. Users in the US find that the British accented voices sound more natural. UK users tend to like the US voices better. Hearing a different accent helps mask the robotic sound of the automated voices.

Play around with our different voices for your application and see what sounds best to you.

Improved Text to Speech

Monday, June 7th, 2010

We just released an update that improves the sound of the default text to speech voice; a few developers have been testing out Allison in recent weeks and now we’ve not only released it for everyone, but also made it the default voice.

If you have an app that’s using the default TTS voice, try giving your app a call and give it a listen.

International Male (Voices)

Tuesday, April 20th, 2010

Since going international a few weeks ago, we’ve had some requests for more text to speech (TTS) voices. In our launch, we provided eight female voices in various languages and dialects. A number of customers have inquired about male voices for those same dialects.

They’re available starting now. Just like the female voices, each male voice is identified by a character’s name. The new voices we’ve added are:

  • jorge – Castilian Spanish – Male
  • bernard – French – Male
  • dave – US English – Male
  • simon – British English – Male
  • stefan – German – Male
  • luca – Italian – Male
  • willem – Dutch – Male
  • carlos – Mexican Spanish – Male

To use one of the new voices, just add the character’s name to your text to speech command in Tropo. For example, in PHP, you can do…

<?php
answer();
say('Comment allez-vous?', array('voice' => 'bernard'));
say('ça va bien merci.', array('voice' => 'florence'));
?>

Or in Ruby, try:

answer
say "1 2 3 4 5 my name is Dave and I'm from the U S.", :voice => 'dave'
say "Hello, Dave. I'm Simon from London.", :voice => 'simon'

Want to use one of these languages in speech recognition? Our Automated Speech Recognition (ASR) platform recognizes all of the languages we can speak, regardless if if they’re said by a man or woman.

In JavaScript:

answer();
result=ask("Quel âge avez-vous?", {
  choices:"[1-3 digits]",
  recognizer:"fr-fr",
  voice:"bernard"
  } );
if (result.name=='choice') {
  say("C'est bon. Je suis " + result.value + " ans.", {voice:"bernard"})
}

The full list of voices and documentation of how to use various TTS voices is available under the say() function.

Teaching Your Application to Really Talk

Friday, March 26th, 2010

Speech Synthesis, otherwise known as Text to Speech (TTS), is a technology that quickly synthesizes a human voice using text as input. Speech synthesis  is the default behavior for voice calls on the Tropo platform. The Tropo ‘say‘ verb is the one that provides the TTS capability, by taking a string of text and speaking it back. It is of course possible for this verb to take a URL to a ‘wav’ or ‘mp3′ file for pre-recorded audio to be played as well.

When it comes to teaching your application to speak we follow the Perl ethos of making “the simple things easy and difficult things possible”. So your application may speak very well with the simplicity of our APIs, or it may be as sophisticated and emotional as you like through Tropo exposing powerful capabilities for giving your voices character.

For our first example we will simply say:

say 'I like squirrels!'

Which then renders this audio.

Next, we may choose from a voice that speaks any number of languages supported by Tropo (US/UK English, Castilian/Mexican Spanish, French, German, Italian & Dutch). Lets give French a try for our next example:

say "J'aime les écureuils!", :voice => 'florence'

Which then renders this audio.

Now, those were the simple examples that anyone may use to add a little speech to their applications. But, remember, we also make the difficult possible for those who want to really make their characters speak. As sometimes simply customizing the voice is not enough. There are cases when you’d also like control over pitch, volume and intonation. Tropo natively supports a standard called the Synthesized Speech Markup Language (SSML).

The Speech Synthesis Markup Language (SSML) is a W3C standard for controlling the pace, tone, pitch and all around sound of computer generated voices. Here’s a Ruby script that repeats the same sentence four times; each at a gradually lower speed:

answer 
say "<speak> I like squirrels!. 
I <prosody rate='-10%'>like squirrels!</prosody> 
I <prosody rate='-30%'>like squirrels!</prosody>  
I <prosody rate='-50%'>like squirrels!</prosody> 
</speak>"  
hangup

Which renders this audio. The previous example made use of the rate property of the SSML prosody element to control the playback speed. There are many other elements and attributes you may use, including: emphasis, phoneme, etc. To learn more about SSML and related technologies check out the W3C site at http://www.w3.org/TR/speech-synthesis/.

If you would like to call in and listen to these examples live, you may do so by dialing +990009369991429940 on Skype (free) or calling +1.408.940.5920 from any phone. What are you waiting for? Get started by signing up for an always free developer account @ Tropo.com.

Roger Ebert’s new voice

Friday, March 5th, 2010

Text to speech engines have long been used to allow those who cannot talk to communicate verbally. Film critic Roger Ebert, who lost his lower jaw and his voice to cancer, has taken it a step further by creating a TTS voice that sounds like him.

Using the hundreds of hours of archived film clips from his reviews and other TV appearances, Ebert’s voice was reconstructed by Scotland’s CereProc, a developer of text to speech technology.

Debuting his new voice on Tuesday on Oprah, Ebert said, “You’ll know it’s a computer, but one that sounds like me. It still needs improvement but at least it sounds like me. In first grade they said I talked too much, and now I still can.”