Tropo is part of CiscoLearn More

Using Persistent Sockets in Tropo.com Applications

Posted on September 10, 2009 by Jason Goecke

While Tropo supports RESTful Web Services as a form of moving data to and from the communication cloud, it may not always be fast enough for all applications. There are apps that require the lowest possible latency, for example, when mobile devices become input devices. The unique approach of Tropo, allowing developers to host scripts in our cloud, allows you the ability to write applications that take direct advantage of persistent sockets. This means that you may open the socket once and then stream data to your remote application in realtime without having to establish HTTP connections each time.

I recently created an example of this using Ruby to serve a socket using EventMachine, and then writing a script on Tropo that opens a socket and sends touch-tones (DTMF) down the socket immediately as they come in. Here it is in action:

The code examples may be found here.

Latest Tropo Upgrade Completed

Posted on September 8, 2009 by Jason Goecke

We are continuing to evolve Tropo, by releasing a new upgrade to the Tropo cloud. This upgrade includes the following:

  • Support for fetching Java Speech Grammar Format (JSGF) and Speech Recognition Grammar Specification (SRGS) files from an external HTTP or FTP server, in addition to the built in support for Simple Grammar. We are working on a couple of follow up posts for how-tos on using these enhanced speech grammar capabilities.
  • When placing an outbound call, you must now include the ‘+’ and country code. To dial in the US would then need to be ‘+1415551212’ for every outbound call.
  • The ‘#’ symbol on the telephone keypad may now be used to terminate a recording in a Tropo application.
  • Addition of MP3 as an audio file playback format.
  • You may now play touch-tones (DTMF) after a call has connected. You may now issue a call with these additional parameters: “+14155551212;postd=1234;pause=22000ms”. Where ‘postd’ is the digits to be dialed and ‘pause’ is the amount of time to wait after connecting the call to issue the digits.
  • New accounts will now need to request outbound dialing access from support@voxeo.com. All existing accounts have outbound enabled and will continue to do so.

We continue to work on many new features and will roll them out as they become available. Enjoy the new features!

Talking to the Cloud, and the Cloud Talking Back

Posted on August 28, 2009 by Jason Goecke

One of the great things about Tropo is that it has a speech recognition and text to speech engine built right in. This allows a user to speak commands to your voice application and respond to them with dynamically generated content. We make every effort to make these features robust and yet simple to use for developers.

In the first case, we will ask the user to provide their zipcode and then play it back to them:

  answer
  options = { 'choices'     => '[5 DIGITS]',
              'repeat'      => 3,
              'onBadChoice' => lambda { say 'That is not a zip code, please try again.'} }
  choice = ask 'Please enter your zip code.', options

  # Add spaces to speak back individual digits, rather than one number
  zipcode = String.new
  choice.value.split(//).each { |char| zipcode << char + ' ' }
  say "Your zip code is #{zipcode}. Goodbye."

The key to this is the option ‘choices’, which is where we may pass our simple grammar to prompt the user. In this case we are asking the speech recognition engine to ask for up to [5 DIGITS] and the user may then either use their phone’s touch tone keypad or speak their response. We then take that response, which comes back as a string, and add spaces in between the numbers so that it is spoken back as you would a zipcode as opposed to a single number.

Now that is for digits, of course one may always use their telephone to enter digits you say. Now lets look at asking our customer questions:

answer
  options = { 'choices'     => 'cheese, pepperoni, vegetarian',
              'repeat'      => 3,
              'onChoice'    => lambda { |choice| say "We will send you a #{choice.value}
                                                      pizza. Goodbye." },

              'onBadChoice' => lambda { say 'We do not have that kind of pizza,
                                             please try again.'} }
  choice = ask 'Which pizza would you like to order?', options

In this case we are passing the ‘choices’ option a string that provides multiple spoken choices that the user may speak to have a valid response. We are then playing that response back to the user when we recognize it as the value is populated in ‘choices.value’.

That was for simple multiple choice, what if more than one phrase may qualify for a single response?:

answer
  options = { 'choices'     => 'denver broncos(broncos, denver, denver broncos),
                                dallas cowboys(cowboys, dallas, dallas cowboys)',
              'repeat'      => 3,
              'onChoice'    => lambda { |choice| say "A so you like the #{choice.value}
                                                      do you?. Goodbye." },
              'onBadChoice' => lambda { say 'We do not have that team, please try again.'} }
  choice = ask 'Who is your favorite football team?', options

First off, I am not making any statements about NFL teams here, just shortening the choices for the purposes of brevity.  In this case we are passing the ‘choices’ option a string that contains the responses we expect (ie – denver broncos) with a series of possible spoken phrases inside the parenthesis that could qualify. When one of those phrases is recognized, the qualifying value gets populated in the ‘choices.value’.

So what are you waiting for? Start talking with your users. There are many more examples in multiple languages may be found here.