Tropo is part of CiscoLearn More

Advanced grammar topics for Tropo

Posted on December 17, 2009 by Adam Kalsey

This is a guest post from Dominique Boucher, Product Manager at Nu Echo.


Recently, Tropo has added support for SRGS grammars and JSGF grammars. Mike Thompson and Jason Goecke wrote about that a few weeks ago. In this post, I will go one step further and show some more advanced tools in the voice recognition ecosystem.

Authoring grammars

Once you’ve decided to add SRGS grammars to your Tropo application, one question arises: which format will you use to author them? XML or ABNF?

XML has the advantage of being supported by most major recognition engines on the market. It is often the native format for the engine. On the other hand, ABNF is much more compact and readable than the XML equivalent. For example, here is the same yes/no grammar expressed in both ABNF:

yesno-abnf

and XML (click to enlarge):

yesno-xml

I don’t know for you, but I much prefer the former.

At SpeechTEK’09, in August, Voxeo announced a partnership with Nu Echo to bundle NuGram IDE Basic Edition with the VoiceObjects Developer Edition. NuGram IDE is a complete environment for developing, debugging and testing voice recognition grammars in the ABNF syntax. (The basic edition, which is free of charge, can also be installed separately, directly from Eclipse.) With NuGram IDE, you write and test ABNF grammars on your desktop, without requiring a voice recognition engine. Once you are satisfied with your grammars, you integrate them in your application. And if you prefer to use their XML counterpart, just let NuGram IDE do the grunt conversion work for you.

Dynamic grammars

Most grammars used by voice applications are static grammars. By this, I mean that they do not change over time, nor do they depend on contextual, call-specific data.

Sometimes, however, the application needs to generate grammars on-the-fly. We call them dynamic grammars. Consider a voice-dialing application. The application identifies the caller, looks in the caller’s contact list, and asks for the name of one of his contacts. The grammar to use in the last step depends on the content of the caller’s profile.

But how are dynamic grammars served to the application? Well, usually a dedicated web application will be responsible for that. This can be a JSP or ASP page, a Ruby on Rails app, etc. In all these cases, a web application must be developed and made accessible to the Tropo runtime.

Another solution is NuGram Hosted Server (or NHS). It’s a free hosted platform specifically designed to serve dynamic grammars to cloud-based communication applications. So it nicely complements Tropo. All you have to do is create your dynamic grammar templates and push them to your NHS account once you have registered, all from within NuGram IDE (publishing grammars is done using a single keystroke — Alt-Ctrl-Shift P — from the IDE).

The dynamic grammar templates are expressed using a few extensions to the ABNF syntax. For example, the template for the voice-dialing grammar would look like (click to enlarge):

voicedialer-abnf

A tutorial describing the various templating directives is available on the NuGram website.

The client API

Generating a dynamic grammar from a template (instantiating a grammar) involves sending some data (the instantiation context) to NHS using an HTTP-based API. Fortunately, higher-level client APIs are available on Github in a variety of programming languages. (All you have to do is include the code of the API at the top of your Tropo application.)

To illustrate, here is a prototype voice dialing application that instantiates the grammar template above and uses the URL of the generated grammar formatted in XML form (click to enlarge):

voicedialer-rb

Lines 10-11 simply create a new session with NuGram Hosted Server. Line 19 retrieves the contacts for the current caller. Finally, line 26 instantiates the grammar with the contacts template and retrieves the URL of the generated grammar in XML form.

The get_contacts function simply returns a list of hashes of the form {'firstname' => "first name", 'lastname' => "last name", 'extension' => "phone number"}, one for each of the caller’s contacts. In our demo, the function makes a request to a web application that formats the contacts as a JSON string, and converts the data to a plain Ruby data structure (click to enlarge):

getcontacts-rb

Of course, in a real application, the data could be fetched from a Web service, a database, etc.

That’s it! The code of the whole application is also available on Github. The Ruby application called by the get_contacts function is a web application based on Sinatra and is readily deployable on Heroku.

If you have any question or comment, please leave a comment.

Thomas Howe introduces the Mobile Mail List mashup for Tropo with screencasts, source code

Posted on November 11, 2009 by Adam Kalsey

Over on his blog, mashup king Thomas Howe introduces a new Mobile Mail List application for use with Tropo.com. As Thomas describes it:

It allows retail facing businesses to quickly and easily let their customers sign up for customer care programs using their cell phone, then uses text messaging to send coupons back to the customers. It’s free and open source, and it runs today on Tropo.

Thomas indicates he’ll be doing a series of screencasts, the first two of which are out already. He has also made the source code available over on Github and included an architecture diagram to show how the pieces all fit together.

The first screencast introduces the Mobile Mail List application:

The second screencast, embedded in his second blog post dives into the voice app used on Tropo.com:

If you would like to try out Thomas’ app yourself, you can download the Mobile Mail List software and then use it with your Tropo.com account (which is free if you haven’t created one yet) as well as your own web server.

It’s cool to see and we’re looking forward to seeing the other screencasts Thomas creates!

Powerful Speech-Driven Tropo Applications

Posted on September 28, 2009 by tropo

In the previous weeks, Jason Goecke made a post regarding how to use Tropo’s Simple Grammar Engine to do some trivial voice recognition in your applications. In today’s blog, I will be showing you how to take that a step further, and implement some industry standard grammars and interpretation mechanisms. These grammar types will allow Tropo to utilize the same advanced level of speech recognition you might use or expect in VoiceXML applications today.

Before we get started with the examples, here is a list of the types of grammars (and return styles) which will be available to you:

SRGS (Also referred to as grXML)

SISR (Semantic Interpretation for Speech Recognition)

GSL

ABNF

GSL syntax is not considered to be a W3C-compliant syntax for grammars, and Nuance has discontinued support for GSL grammars in their most recent product offerings. Tropo will continue to support GSL-specific markup for some time to come, but it is strongly suggested that new applications and their associated grammars leverage the SRGS + SISR grammar syntaxes instead of being reliant upon the deprecated GSL grammar format.

The above being said, the example I will be showing you in this post will be Tropo utilizing an SRGS grammar with SISR returns. This is 100% W3C compliant, and is the industry standard for grammar development. Let’s start with our grammar:

grammar

Those of you familiar with grammars will likely notice this structure. If not, a great place to get started is here. The above grammar accepts the following utterances:

Red Sox, Boston Red Sox, Yankees, or New York Yankees

Based on the team you choose, you will get some information back about the team. Specifically, the value you would like returned for the team, the league they play in, their division, and standing. The grammar is quite simple, and I made it this way to illustrate the concept of using external grammars with your Tropo applications. Feel free to go as crazy as you want with these grammars.

How does one tie this grammar into a Tropo application? It’s easy! Let’s take a look at a basic Ruby app:

ruby-app

Notice when we declare our choices within “options”, I simply reference the remote destination of my SRGS/GRXML grammar with SISR returns. As soon as the prompt starts, we should be able to say any of the above utterances. When the result comes back, you can get the slot values (team,division,standing,etc) by accessing them directly:

result.choice.tag.get(“team”)

result.choice.tag.get(“division”)

result.choice.tag.get(“standing”)

result.choice.tag.get(“league”)

That’s it! At this point, you should have the information needed to start developing your own Tropo applications with powerful voice recognition capability. If you have any questions at all, feel free to contact our free 24×7 Support team! We are more than happy to help you with any issues you may encounter!