Tropo’s international support is one of the features people love best about us. Numbers in 41 countries, SMS delivery worldwide, and text to speech and speech recognition in 9 languages.
Wait, only nine languages? Surely we can do better.
Tropo can now speak and understand 17 new languages (for a total of 24) with speech recognition and text to speech in both male and female voices in Catalan, Danish, Finnish, Canadian French, Galacian (female only), Greek, Mandarin Chinese (female text to speech only), Norwegian (no speech recognition), Russian, Argentine Spanish (male only), Chilean Spanish, (female only), Portuguese, Brazilan Portuguese, Swedish, and Valencian.
We’ve added 14 new voices to existing languages. US English, UK English, French, Italian, Spanish, and Mexican Spanish all get new voices for you to play with.
These languages are available immediately in both our free developer accounts and production applications.
The Tropo documentation will always have a list of the current languages, voices, and recognition engines and is the best place to go for up-to-date information, but for convenience, here’s a list of all the languages and voices that are supported today.
Language
Recognizer
Female voice
Male Voice
English (US)
en-us (default)
Allison (default), Susan, Vanessa
Dave, Steven, Victor
English (UK)
en-gb
Elizabeth, Kate
Simon
Catalan
ca-es
Montserrat
Jordi
Danish
da-dk
Frida
Magnus
Dutch
nl-nl
Saskia
Willem
Finnish
fi-fi
Milla
Mikko
French
fr-fr
Florence, Juliette
Bernard
French (Canadian)
fr-ca
Charlotte
Olivier
Galacian
gl-es
Carmela
German
de-de
Katrin
Stefan
Greek
el-gr
Afroditi
Nikos
Italian
it-it
Giulia, Paola, Silvana, Valentina
Luca, Marcello, Matteo, Roberto
Mandarin Chinese
Linlin, Lisheng
Norwegian
Vilde
Henrik
Polish
pl-pl
Zosia
Krzysztof
Russian
ru-ru
Olga
Dmitri
Spanish (Castilian)
es-es
Carmen, Leonor
Jorge, Juan
Spanish (Argentine)
es-ar
Diego
Spanish (Chilean)
es-cl
Francisca
Spanish (Mexican)
es-mx
Soledad, Ximena, Esperanza
Carlos
Portuguese
pt-pt
Amalia
Eusebio
Portuguese (Brazilian)
pt-br
Fernanda, Gabriela
Felipe
Swedish
sv-se
Annika
Sven
Valencian
x-va
Empar
One thing to note is that if you use PHP, it does not handle unicode very well. This can lead to problems when using some voices in PHP in our scripting API, and is especially problematic when using multibyte languages like Chinese. We’re working with Quercus, our PHP engine, to try and improve the unicode support.
Tropo is only two and a half years old and can speak twenty-four languages. That’s one precocious toddler.
Ham radio or amateur radio communications has been around since the early 1900s. Ham radio technology has kept pace with traditional communications and may even be the only technology that allows people to communicate in natural disasters. Ham radio operators can communicate over very far distances using HF (high frequencies) as well as through satellites via AMSAT and even using VoIP over the Internet using EchoLink, IRLP, or D-STAR!
There are nearly 750,000 FCC licensed ham radio operators in the United States and over 3M licensed operators worldwide. Each operator has federally issued callsign that is used to uniquely identify the station operating on the band.
Using Tropo and Callook (Josh Dick’s W1JDD Callsign API), Chris Matthieu (N7ICE) was able quickly develop a speech recognition and text-to-speech based telephony app that is accessible by any of the following channels:
Upon calling the application, you are asked to spell a callsign using military phonetics:
A – Alfa, B – Bravo, C – Charlie, D – Delta, E – Echo, F – Foxtrot, G – Golf, H – Hotel, I – India, J – Juliet, K – Kilo, L – Lima, M – Mike, N – November, O – Oscar, P – Papa, Q – Quebec, R – Romeo, S – Sierra, T – Tango, U – Uniform, V – Victor, W – Whiskey, X – X-Ray, Y – Yankee, Z – Zulu
In addition to these commands, you can say restart to start over or stop if your callsign is entered correctly. Upon saying stop, the Tropo application does a REST-based call to Callook to get a JSON response of the data related to the callsign inquired. In addition to the communication channels listed above, Chris Matthieu was able to use his handheld hamradio (like the one featured above) to communicate using VHF (very high frequencies) to connect to a repeater nearly 50 miles away on a mountaintop and connect to Tropo via an auto-patch phone line to perform a callsign lookup. Here is a screencast and source code for the application!
Here is the source code running on Tropo’s Scripting API:
require 'rest_client'
require 'json'
answer
sleep 2
say "welcome to the tropo ham radio call sign lookup application"
callsign = ""
callsigntext = ""
loop do
result = ask "spell the callsign phonetically. say stop when done or restart to start over", {
:choices => "alpha, bravo, charlie, delta, echo, foxtrot, golf, hotel, india, juliette, kilo, lima, mike, november, oscar, papa, quebec, romeo, sierra, tango, uniform, victor, whiskey, xray, yankee, zulu, one, two, three, four, five, six, seven, eight, nine, zero, stop, restart"}
if result.value == "stop"
break
elsif result.value == "restart"
callsign = ""
callsigntext = ""
else
callsigntext = callsigntext + " " + result.value
say "so far you entered #{callsigntext}"
letter = case result.value
when "alpha" then "a"
when "bravo" then "b"
when "charlie" then "c"
when "delta" then "d"
when "echo" then "e"
when "foxtrot" then "f"
when "golf" then "g"
when "hotel" then "h"
when "india" then "i"
when "juliette" then "j"
when "kilo" then "k"
when "lima" then "l"
when "mike" then "m"
when "november" then "n"
when "oscar" then "o"
when "papa" then "p"
when "quebec" then "q"
when "romeo" then "r"
when "sierra" then "s"
when "tango" then "t"
when "uniform" then "u"
when "victor" then "v"
when "whiskey" then "w"
when "xray" then "x"
when "yankee" then "y"
when "zulu" then "z"
when "one" then "1"
when "two" then "2"
when "three" then "3"
when "four" then "4"
when "five" then "5"
when "six" then "6"
when "seven" then "7"
when "eight" then "8"
when "nine" then "9"
when "zero" then "0"
end
if letter
callsign = callsign + letter
end
end
end
response = RestClient.get 'http://callook.info/' + callsign + '/json'
data = JSON.parse(response)
say callsigntext + "belongs to "
say data["name"]
say "in " + data["address"]["line2"]
say "and holds a " + data["current"]["operClass"] + " license"
Mark Headd recorded an awesome screencast on getting Tropo running on Node.js using the Tropo Node.js library. While libraries make code easier to write, I wanted to see what was happening under the covers when writing a Tropo application using Node.js without any magic.
This experiment was easier that you may think! Since Tropo speaks JSON natively, all you need to do is spin up a node.js server like the hello world demo on the node.js home page. Next substitute “application/json” for Content-Type and send Tropo JSON text in place of hello world as shown below:
var http = require('http');
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'application/json'});
res.end('{"tropo":[{"say":[{"value":"This call is running from node J.S. Have a nice day. Goodbye."}]},{"hangup":null}]}');
}).listen(3000, "127.0.0.1");
console.log('Server running at http://127.0.0.1:3000/');
Next, deploy this script to your node.js hosting provider or run Tunnlr as demonstrated by Mark and create a new Tropo WebAPI application pointing at the deployed URL. Now call your application and listen to Node.js in action.
We just released an update that improves the sound of the default text to speech voice; a few developers have been testing out Allison in recent weeks and now we’ve not only released it for everyone, but also made it the default voice.
If you have an app that’s using the default TTS voice, try giving your app a call and give it a listen.
Since going international a few weeks ago, we’ve had some requests for more text to speech (TTS) voices. In our launch, we provided eight female voices in various languages and dialects. A number of customers have inquired about male voices for those same dialects.
They’re available starting now. Just like the female voices, each male voice is identified by a character’s name. The new voices we’ve added are:
jorge – Castilian Spanish – Male
bernard – French – Male
dave – US English – Male
simon – British English – Male
stefan – German – Male
luca – Italian – Male
willem – Dutch – Male
carlos – Mexican Spanish – Male
To use one of the new voices, just add the character’s name to your text to speech command in Tropo. For example, in PHP, you can do…
<?php
answer();
say('Comment allez-vous?', array('voice' => 'bernard'));
say('ça va bien merci.', array('voice' => 'florence'));
?>
Or in Ruby, try:
answer
say "1 2 3 4 5 my name is Dave and I'm from the U S.", :voice => 'dave'
say "Hello, Dave. I'm Simon from London.", :voice => 'simon'
Want to use one of these languages in speech recognition? Our Automated Speech Recognition (ASR) platform recognizes all of the languages we can speak, regardless if if they’re said by a man or woman.
In JavaScript:
answer();
result=ask("Quel âge avez-vous?", {
choices:"[1-3 digits]",
recognizer:"fr-fr",
voice:"bernard"
} );
if (result.name=='choice') {
say("C'est bon. Je suis " + result.value + " ans.", {voice:"bernard"})
}
The full list of voices and documentation of how to use various TTS voices is available under the say() function.
Speech Synthesis, otherwise known as Text to Speech (TTS), is a technology that quickly synthesizes a human voice using text as input. Speech synthesis is the default behavior for voice calls on the Tropo platform. The Tropo ‘say‘ verb is the one that provides the TTS capability, by taking a string of text and speaking it back. It is of course possible for this verb to take a URL to a ‘wav’ or ‘mp3′ file for pre-recorded audio to be played as well.
When it comes to teaching your application to speak we follow the Perl ethos of making “the simple things easy and difficult things possible”. So your application may speak very well with the simplicity of our APIs, or it may be as sophisticated and emotional as you like through Tropo exposing powerful capabilities for giving your voices character.
Now, those were the simple examples that anyone may use to add a little speech to their applications. But, remember, we also make the difficult possible for those who want to really make their characters speak. As sometimes simply customizing the voice is not enough. There are cases when you’d also like control over pitch, volume and intonation. Tropo natively supports a standard called the Synthesized Speech Markup Language (SSML).
The Speech Synthesis Markup Language (SSML) is a W3C standard for controlling the pace, tone, pitch and all around sound of computer generated voices. Here’s a Ruby script that repeats the same sentence four times; each at a gradually lower speed:
answer
say "<speak> I like squirrels!.
I <prosody rate='-10%'>like squirrels!</prosody>
I <prosody rate='-30%'>like squirrels!</prosody>
I <prosody rate='-50%'>like squirrels!</prosody>
</speak>"
hangup
Which renders this audio. The previous example made use of the rate property of the SSML prosody element to control the playback speed. There are many other elements and attributes you may use, including: emphasis, phoneme, etc. To learn more about SSML and related technologies check out the W3C site at http://www.w3.org/TR/speech-synthesis/.
If you would like to call in and listen to these examples live, you may do so by dialing +990009369991429940 on Skype (free) or calling +1.408.940.5920 from any phone. What are you waiting for? Get started by signing up for an always free developer account @ Tropo.com.
Text to speech engines have long been used to allow those who cannot talk to communicate verbally. Film critic Roger Ebert, who lost his lower jaw and his voice to cancer, has taken it a step further by creating a TTS voice that sounds like him.
Using the hundreds of hours of archived film clips from his reviews and other TV appearances, Ebert’s voice was reconstructed by Scotland’s CereProc, a developer of text to speech technology.
Debuting his new voice on Tuesday on Oprah, Ebert said, “You’ll know it’s a computer, but one that sounds like me. It still needs improvement but at least it sounds like me. In first grade they said I talked too much, and now I still can.”
One of the great things about Tropo is that it has a speech recognition and text to speech engine built right in. This allows a user to speak commands to your voice application and respond to them with dynamically generated content. We make every effort to make these features robust and yet simple to use for developers.
In the first case, we will ask the user to provide their zipcode and then play it back to them:
answer
options = { 'choices' => '[5 DIGITS]',
'repeat' => 3,
'onBadChoice' => lambda { say 'That is not a zip code, please try again.'} }
choice = ask 'Please enter your zip code.', options
# Add spaces to speak back individual digits, rather than one number
zipcode = String.new
choice.value.split(//).each { |char| zipcode << char + ' ' }
say "Your zip code is #{zipcode}. Goodbye."
The key to this is the option ‘choices’, which is where we may pass our simple grammar to prompt the user. In this case we are asking the speech recognition engine to ask for up to [5 DIGITS] and the user may then either use their phone’s touch tone keypad or speak their response. We then take that response, which comes back as a string, and add spaces in between the numbers so that it is spoken back as you would a zipcode as opposed to a single number.
Now that is for digits, of course one may always use their telephone to enter digits you say. Now lets look at asking our customer questions:
answer
options = { 'choices' => 'cheese, pepperoni, vegetarian',
'repeat' => 3,
'onChoice' => lambda { |choice| say "We will send you a #{choice.value}
pizza. Goodbye." },
'onBadChoice' => lambda { say 'We do not have that kind of pizza,
please try again.'} }
choice = ask 'Which pizza would you like to order?', options
In this case we are passing the ‘choices’ option a string that provides multiple spoken choices that the user may speak to have a valid response. We are then playing that response back to the user when we recognize it as the value is populated in ‘choices.value’.
That was for simple multiple choice, what if more than one phrase may qualify for a single response?:
answer
options = { 'choices' => 'denver broncos(broncos, denver, denver broncos),
dallas cowboys(cowboys, dallas, dallas cowboys)',
'repeat' => 3,
'onChoice' => lambda { |choice| say "A so you like the #{choice.value}
do you?. Goodbye." },
'onBadChoice' => lambda { say 'We do not have that team, please try again.'} }
choice = ask 'Who is your favorite football team?', options
First off, I am not making any statements about NFL teams here, just shortening the choices for the purposes of brevity. In this case we are passing the ‘choices’ option a string that contains the responses we expect (ie – denver broncos) with a series of possible spoken phrases inside the parenthesis that could qualify. When one of those phrases is recognized, the qualifying value gets populated in the ‘choices.value’.
So what are you waiting for? Start talking with your users. There are many more examples in multiple languages may be found here.