This commercial was released in 1971 but touchtone (DTMF) was originally invented in 1941. That was 70 years ago! If your voice application is still using touchtone for user input, don’t you think it’s time to enter the 21st century?
Tropo offers speech recognition in 24 languages by default on every prompt allowing callers to use either touchtone or their voice to answer prompts during the call. We support four forms of speech recognition grammars include:
Using Tropo’s simple grammars are incredibly, well, simple. Here is what an auto-attendant’s dial directory prompt would look like using Tropo’s simple speech recognition grammar.
ask "Welcome to the Tropo company directory. Who are you trying to reach?", {
:choices => "department(support, engineering, sales), person(chris, jason, adam)",
:onChoice => lambda { |event|
say("You said " + event.choice.interpretation + ", which is a " + event.value)
}
}
From here you could also transfer the call to the department or person’s phone number or SIP address by substituting the onChoice say prompt with the following logic:
say "Please wait while we transfer your call. Press star to cancel the transfer."
transfer ["+14075550100","sip:12345678912@221.122.54.86"], {
:playvalue => "http://www.phono.com/audio/holdmusic.mp3",
:terminator => "*",
:onTimeout => lambda { |event|
say "Sorry, but nobody answered"}
}
The possibilities of developing new voice apps using speech recognition are endless! For one, your customers will have a better user experience (UX) by not having to press buttons on the phone. Secondly, apps such as language translater applications and personal assistant applications are now possible.
We want to be your speech recognition and cloud communications partner.
Remember when the iPhone was only available on AT&T? That was true until October 11, 2009 when a young coder named geohot (and friends) released the first iPhone/iPod jailbreak. Suddenly iPhones weren’t tied to to just AT&T…now you could give AT&T the boot and choose Verizon or T-Mobile as your service provider. Score!
In the spirit of geohot’s jailbreaking efforts, the rapscallions at Disruptive Technologies took on the the task of “jailbreaking” OpenVBX.
OpenVBX is a web-based open source phone system. It’s essentially a virtual PHP/MySQL PBX and it’s available for download from GitHub. Users of OpenVBX can make phone calls, send text messages…all very cool.
The catch is…you’re locked into one service provider: Twilio. There’s no way to choose to use another cloud telephony provider…until now.
Today we’re pleased to announce a new fork of OpenVBX that adds support for Tropo. For the first time, users of OpenVBX will have a choice of multiple platforms on which to run it, making it REALLY OpenVBX.
The coders at Disruptive Technologies added full support for the the Tropo API and Phono SIP-based VoIP web phone to the communications layer of the OpenVBX project. Of course, when selecting the Tropo API, users will now get access to all of the more advanced features of the Tropo network: speech recognition and text-to-speech in 24 languages, phone numbers in over 40 countries, international SMS, in/outbound SIP VoIP support, inbound Skype support, multiple phone numbers per callflow script, improved conferencing.
Disruptive Technologies also extended OpenVBX with the VoiceVault API to support Voice Biometrics in password resets. After adding VoiceVault credentials on the API Accounts Tab, the password reset dialog will provide an option to request a phone call to reset your OpenVBX account password.
The OpenVBX fork with Tropo can be found on GitHub. We have sent the maintainer of the OpenVBX project a pull request to merge these updates into the project. The following features and bugfixes have been added to the OpenVBX package:
Fixed a redirect bug. OpenVBX no longer incorrectly redirects users to 404 pages.
Fixed bug in Twilio client. 60 seconds after the user has been “inactive”, the client is no longer able to be called for that user. This prevents calling the client if the user has closed their browser. (This also works for the Phono Client)
Added support for the Tropo API. You can now add a “Tropo API account” on the system settings page, and from the installer. Either a Tropo or Twilio account is required. Included in the new Tropo API additions are:
Support for Tropo domestic and international phone numbers, on the “numbers” page.
All applets in the “flows” page now support Tropo JSON as well as TwiML. Any number can be assigned any flow – so a tropo number and a twilio number can both be assigned the same flow.
Support for existing Tropo numbers & applications. If the user prefers to set up their numbers initially in Tropo.com, the application will see these numbers and they can be assigned a flow within the application.
Recordings, and Voicemail, as well as outbound dialing with Tropo.
Several theme changes. The OpenVBX logo has been modified to include both the Tropo and Twilio logo. If only one of the accounts is active, only that logo will show in the VBX logo – so if a user only has a Twilio account, only the Twilio logo will show, and vice versa. Other minor theme changes:
Several pages in the System Settings tabs have been reworked. Notibly the API accounts page, which now has each API accounts logos.
Step 3 of the installation has been reworked.
Several Twilio-specific content has been changed to be more ambiguous.
Added support for “Phono” browser phone, in addition to the Twilio Client. Any non-Twilio based numbers will use the new Phono browser phone.
This project has since been renamed to TropoVBX. Please refer to the updated blog post and new source code repository on GitHub.
Automated Speech Recognition (ASR) as-a-Service can be powered via Tropo and SIP using this simple open source code provided below. This service is perfect for adding speech recognition to your existing Asterisk, FreeSwitch, YATE, or enterprise app from the stone ages.
Here’s how it works!
You can transfer a phone call via SIP from a platform that doesn’t support ASR to Tropo along with the following SIP headers: prompt, choices, and returnaddress (SIP address). Tropo automatically answers the call and prompts the user the the text-to-speech (TTS) prompt passed. It automatically loads the ASR grammar with the choices passed. Upon successfully processing the speech recognition, Tropo transfers the call and the recognized result back to the return SIP adrress along with an x-voxeo-result SIP header containing your keyword spoken by the user.
Here’s the Tropo code using our hosted Scripting API:
This code is written in Ruby using our Tropo Scripting API. You can use $currentCall.getHeader to get the SIP headers passed to your Tropo application and you can send headers to other SIP applications using the Tropo transfer method. The speech recognition magic happens in the Ask method.
We used Phono, our browser-based webphone, to call Tropo and pass the prompt, choices, and returnaddress parameters.
phono.phone.dial("sip:9996106030@sip.tropo.com", {
headers: [
{
name:"returnaddress",
value: "sip:9996106032@sip.tropo.com" // you could use returnaddress var to send the results back to Phono's SIP address
},
{
name:"prompt",
value: "What is your favorite color?"
},
{
name:"choices",
value: "blue,green,red,yellow"
}
],
Just to prove that yet another SIP application could receive the speech recognition results, we created another Tropo application in Ruby using the Scripting API to simple say the results.
say "You said " + $currentCall.getHeader("x-voxeo-result")
This second test application would typically be omitted for a real application since the returnaddress would most likely be the originating SIP address of your switch to return to your original callflow. It’s cool that you could transfer to yet another application for additional processing!
What’s Next?
You can clone or fork this open source project on Github and use it today for as little as $.03 per minute for the Tropo call. Let us know if you would prefer for us to build this service out for commercial use.
Ham radio or amateur radio communications has been around since the early 1900s. Ham radio technology has kept pace with traditional communications and may even be the only technology that allows people to communicate in natural disasters. Ham radio operators can communicate over very far distances using HF (high frequencies) as well as through satellites via AMSAT and even using VoIP over the Internet using EchoLink, IRLP, or D-STAR!
There are nearly 750,000 FCC licensed ham radio operators in the United States and over 3M licensed operators worldwide. Each operator has federally issued callsign that is used to uniquely identify the station operating on the band.
Using Tropo and Callook (Josh Dick’s W1JDD Callsign API), Chris Matthieu (N7ICE) was able quickly develop a speech recognition and text-to-speech based telephony app that is accessible by any of the following channels:
Upon calling the application, you are asked to spell a callsign using military phonetics:
A – Alfa, B – Bravo, C – Charlie, D – Delta, E – Echo, F – Foxtrot, G – Golf, H – Hotel, I – India, J – Juliet, K – Kilo, L – Lima, M – Mike, N – November, O – Oscar, P – Papa, Q – Quebec, R – Romeo, S – Sierra, T – Tango, U – Uniform, V – Victor, W – Whiskey, X – X-Ray, Y – Yankee, Z – Zulu
In addition to these commands, you can say restart to start over or stop if your callsign is entered correctly. Upon saying stop, the Tropo application does a REST-based call to Callook to get a JSON response of the data related to the callsign inquired. In addition to the communication channels listed above, Chris Matthieu was able to use his handheld hamradio (like the one featured above) to communicate using VHF (very high frequencies) to connect to a repeater nearly 50 miles away on a mountaintop and connect to Tropo via an auto-patch phone line to perform a callsign lookup. Here is a screencast and source code for the application!
Here is the source code running on Tropo’s Scripting API:
require 'rest_client'
require 'json'
answer
sleep 2
say "welcome to the tropo ham radio call sign lookup application"
callsign = ""
callsigntext = ""
loop do
result = ask "spell the callsign phonetically. say stop when done or restart to start over", {
:choices => "alpha, bravo, charlie, delta, echo, foxtrot, golf, hotel, india, juliette, kilo, lima, mike, november, oscar, papa, quebec, romeo, sierra, tango, uniform, victor, whiskey, xray, yankee, zulu, one, two, three, four, five, six, seven, eight, nine, zero, stop, restart"}
if result.value == "stop"
break
elsif result.value == "restart"
callsign = ""
callsigntext = ""
else
callsigntext = callsigntext + " " + result.value
say "so far you entered #{callsigntext}"
letter = case result.value
when "alpha" then "a"
when "bravo" then "b"
when "charlie" then "c"
when "delta" then "d"
when "echo" then "e"
when "foxtrot" then "f"
when "golf" then "g"
when "hotel" then "h"
when "india" then "i"
when "juliette" then "j"
when "kilo" then "k"
when "lima" then "l"
when "mike" then "m"
when "november" then "n"
when "oscar" then "o"
when "papa" then "p"
when "quebec" then "q"
when "romeo" then "r"
when "sierra" then "s"
when "tango" then "t"
when "uniform" then "u"
when "victor" then "v"
when "whiskey" then "w"
when "xray" then "x"
when "yankee" then "y"
when "zulu" then "z"
when "one" then "1"
when "two" then "2"
when "three" then "3"
when "four" then "4"
when "five" then "5"
when "six" then "6"
when "seven" then "7"
when "eight" then "8"
when "nine" then "9"
when "zero" then "0"
end
if letter
callsign = callsign + letter
end
end
end
response = RestClient.get 'http://callook.info/' + callsign + '/json'
data = JSON.parse(response)
say callsigntext + "belongs to "
say data["name"]
say "in " + data["address"]["line2"]
say "and holds a " + data["current"]["operClass"] + " license"
One of the great things about Tropo is that it has a speech recognition and text to speech engine built right in. This allows a user to speak commands to your voice application and respond to them with dynamically generated content. We make every effort to make these features robust and yet simple to use for developers.
In the first case, we will ask the user to provide their zipcode and then play it back to them:
answer
options = { 'choices' => '[5 DIGITS]',
'repeat' => 3,
'onBadChoice' => lambda { say 'That is not a zip code, please try again.'} }
choice = ask 'Please enter your zip code.', options
# Add spaces to speak back individual digits, rather than one number
zipcode = String.new
choice.value.split(//).each { |char| zipcode << char + ' ' }
say "Your zip code is #{zipcode}. Goodbye."
The key to this is the option ‘choices’, which is where we may pass our simple grammar to prompt the user. In this case we are asking the speech recognition engine to ask for up to [5 DIGITS] and the user may then either use their phone’s touch tone keypad or speak their response. We then take that response, which comes back as a string, and add spaces in between the numbers so that it is spoken back as you would a zipcode as opposed to a single number.
Now that is for digits, of course one may always use their telephone to enter digits you say. Now lets look at asking our customer questions:
answer
options = { 'choices' => 'cheese, pepperoni, vegetarian',
'repeat' => 3,
'onChoice' => lambda { |choice| say "We will send you a #{choice.value}
pizza. Goodbye." },
'onBadChoice' => lambda { say 'We do not have that kind of pizza,
please try again.'} }
choice = ask 'Which pizza would you like to order?', options
In this case we are passing the ‘choices’ option a string that provides multiple spoken choices that the user may speak to have a valid response. We are then playing that response back to the user when we recognize it as the value is populated in ‘choices.value’.
That was for simple multiple choice, what if more than one phrase may qualify for a single response?:
answer
options = { 'choices' => 'denver broncos(broncos, denver, denver broncos),
dallas cowboys(cowboys, dallas, dallas cowboys)',
'repeat' => 3,
'onChoice' => lambda { |choice| say "A so you like the #{choice.value}
do you?. Goodbye." },
'onBadChoice' => lambda { say 'We do not have that team, please try again.'} }
choice = ask 'Who is your favorite football team?', options
First off, I am not making any statements about NFL teams here, just shortening the choices for the purposes of brevity. In this case we are passing the ‘choices’ option a string that contains the responses we expect (ie – denver broncos) with a series of possible spoken phrases inside the parenthesis that could qualify. When one of those phrases is recognized, the qualifying value gets populated in the ‘choices.value’.
So what are you waiting for? Start talking with your users. There are many more examples in multiple languages may be found here.