Posts Tagged ‘speechrecognition’

ASR-as-a-Service

Saturday, May 21st, 2011

Automated Speech Recognition (ASR) as-a-Service can be powered via Tropo and SIP using this simple open source code provided below.  This service is perfect for adding speech recognition to your existing Asterisk, FreeSwitch, YATE, or enterprise app from the stone ages.

Here’s how it works!

You can transfer a phone call via SIP from a platform that doesn’t support ASR to Tropo along with the following SIP headers: prompt, choices, and returnaddress (SIP address). Tropo automatically answers the call and prompts the user the the text-to-speech (TTS) prompt passed. It automatically loads the ASR grammar with the choices passed. Upon successfully processing the speech recognition, Tropo transfers the call and the recognized result back to the return SIP adrress along with an x-voxeo-result SIP header containing your keyword spoken by the user.

Here’s the Tropo code using our hosted Scripting API:

This code is written in Ruby using our Tropo Scripting API.  You can use $currentCall.getHeader to get the SIP headers passed to your Tropo application and you can send headers to other SIP applications using the Tropo transfer method.  The speech recognition magic happens in the Ask method.

recoresult = ask $currentCall.getHeader("x-sbc-prompt"), {
   :choices => $currentCall.getHeader("x-sbc-choices")}

transfer $currentCall.getHeader("x-sbc-returnaddress"), {
    :headers => {'x-voxeo-result' => recoresult.value}
    }

Here’s how we tested it.

We used Phono, our browser-based webphone, to call Tropo and pass the prompt, choices, and returnaddress parameters.

phono.phone.dial("sip:9996106030@sip.tropo.com", {
  	headers: [
	{
		name:"returnaddress",
		value: "sip:9996106032@sip.tropo.com" // you could use returnaddress var to send the results back to Phono's SIP address
	},
	{
		name:"prompt",
		value: "What is your favorite color?"
	},
	{
		name:"choices",
		value: "blue,green,red,yellow"
	}
	],

Just to prove that yet another SIP application could receive the speech recognition results, we created another Tropo application in Ruby using the Scripting API to simple say the results.

say "You said " + $currentCall.getHeader("x-voxeo-result")

This second test application would typically be omitted for a real application since the returnaddress would most likely be the originating SIP address of your switch to return to your original callflow.  It’s cool that you could transfer to yet another application for additional processing!

What’s Next?

You can clone or fork this open source project on Github and use it today for as little as $.03 per minute for the Tropo call.  Let us know if you would prefer for us to build this service out for commercial use.

 

Talking to the Cloud, and the Cloud Talking Back

Friday, August 28th, 2009

One of the great things about Tropo is that it has a speech recognition and text to speech engine built right in. This allows a user to speak commands to your voice application and respond to them with dynamically generated content. We make every effort to make these features robust and yet simple to use for developers.

In the first case, we will ask the user to provide their zipcode and then play it back to them:

  answer
  options = { 'choices'     => '[5 DIGITS]',
              'repeat'      => 3,
              'onBadChoice' => lambda { say 'That is not a zip code, please try again.'} }
  choice = ask 'Please enter your zip code.', options

  # Add spaces to speak back individual digits, rather than one number
  zipcode = String.new
  choice.value.split(//).each { |char| zipcode << char + ' ' }
  say "Your zip code is #{zipcode}. Goodbye."

The key to this is the option ‘choices’, which is where we may pass our simple grammar to prompt the user. In this case we are asking the speech recognition engine to ask for up to [5 DIGITS] and the user may then either use their phone’s touch tone keypad or speak their response. We then take that response, which comes back as a string, and add spaces in between the numbers so that it is spoken back as you would a zipcode as opposed to a single number.

Now that is for digits, of course one may always use their telephone to enter digits you say. Now lets look at asking our customer questions:

  
answer
  options = { 'choices'     => 'cheese, pepperoni, vegetarian',
              'repeat'      => 3,
              'onChoice'    => lambda { |choice| say "We will send you a #{choice.value}
                                                      pizza. Goodbye." },

              'onBadChoice' => lambda { say 'We do not have that kind of pizza,
                                             please try again.'} }
  choice = ask 'Which pizza would you like to order?', options

In this case we are passing the ‘choices’ option a string that provides multiple spoken choices that the user may speak to have a valid response. We are then playing that response back to the user when we recognize it as the value is populated in ‘choices.value’.

That was for simple multiple choice, what if more than one phrase may qualify for a single response?:

  
answer
  options = { 'choices'     => 'denver broncos(broncos, denver, denver broncos),
                                dallas cowboys(cowboys, dallas, dallas cowboys)',
              'repeat'      => 3,
              'onChoice'    => lambda { |choice| say "A so you like the #{choice.value}
                                                      do you?. Goodbye." },
              'onBadChoice' => lambda { say 'We do not have that team, please try again.'} }
  choice = ask 'Who is your favorite football team?', options

First off, I am not making any statements about NFL teams here, just shortening the choices for the purposes of brevity.  In this case we are passing the ‘choices’ option a string that contains the responses we expect (ie – denver broncos) with a series of possible spoken phrases inside the parenthesis that could qualify. When one of those phrases is recognized, the qualifying value gets populated in the ‘choices.value’.

So what are you waiting for? Start talking with your users. There are many more examples in multiple languages may be found here.