You’ve got this great MP3 of your young child doing all the names for your brand-spankin’ new auto-attendant that you’ve created using Tropo, and when you go to call the application, it now sounds like some space alien has taken over your program! Dude, what happened?!?
Well, as it turns out, while cellphones and MP3 players can play audio files like stereo MP3s, they must be “local” on your device. See, telephony standards have just not “kept up with the times.” In Telephony the standard is 8bit, 8Khz u-law formatted WAV, and that standard, is generations removed from that nice 44KHz, 32-bit, stereo MP3 you just recorded. So, in order to play your MP3, it must be converted, on-the-fly, to work on the proper telephony standard — and unfortunately, that usually leads to less than stellar results.
The Tropo platform supports a number of different audio formats. When converting your sound files for optimum performance in your application, it is always best to have your files in 8bit, 8Khz u-law format from the start.
The supported sound formats (and their proper file extensions) for the Tropo platform are as follows:
- 8kHz, 8bit u-law (wav or raw) (*.wav or *.ulaw)
- 8kHz, 8bit a-law (wav or raw) (*.wav or *.alaw)
- 8kHz, 8bit pcm (wav) (*.wav)
- 8khz, 16bit pcm (wav or raw) (*.wav or *.pcm)
- MS-GSM (wav) (*.wav)
- GSM 6.10 (raw) (*.gsm)
Recording your own prompts
- If you don’t plan on mixing and editing the prompts they record, it’s best to just record at 8khz/16bit and then save or convert to 8khz/8bit-ulaw.
- If you plan on mixing and editing prompts you have recorded, it’s best to record at 48Khz/16bit, and not 44Khz, (the down-sample to 8Khz works much better when you are dealing with something that is a multiple of 8k), and then save or convert to 8khz/8bit-ulaw when you are done mixing and editing their prompts.
- Most definitely use the gain control tools in your audio software to adjust the RMS amplitude and peak amplitude of your audio recordings so that audio is loud enough to be heard by customers; but not so loud that prompt echoes will be heard by the recognition engine; and not so loud that pops and clicks are heard as a result of clipping.
Another cross-platform, open-source audio editor application is SoX. SoX is a command-line utility, but has some outstanding capabilities for processing those “hard to fix” audio files.
For those PC-only folks out there, Goldwave is another freeware audio editor that you may want to consider, if Audacity or SoX does not fit the bill for you.
We hope that these hints help you in preparing your audio files for successful use on Tropo.
Many thanks to all those who joined in our April 1 fun with COBOL on Tropo.com. Particular thanks to Moshe Yudkowsky, Jason Goecke, Dan Miller and Mark Headd for their supporting blog posts. Several people in particular, thought analyst Dan Miller was very serious and contacted me to say “is he aware it is a joke?” (He was.) Thanks to all those who “retweeted” the link and otherwise engaged in some April 1 fun on Twitter. Moshe, particularly, was keeping the conversation going much of the day on Twitter. Props to our long-time friend Thomas Howe for chiming in as well. All in all, it was a good bit of fun.
And for the record, in response to this email I received about my Emerging Tech Talk video on April 1:
Even if this makes me look like a complete geek … but I couldn’t help notice a little inaccuracy in your hilarious COBOL video blog: About upper case letters “… we’re not using that high-order bit …” – this is of course nonsense (or did you knowingly make this mistake)? Neither A-Z nor a-z use the 8th bit; and actually, A-Z have lower codes (65-90) than a-z (97-12)
Yes, the latter half of that video was all just nonsense I was spewing with the intent of providing a plausible-sounding rationale. And yes, in response to someone else’s question, it was very hard to say all that with a straight face.
The irony, of course, is that it would be fairly straightforward for us to actually implement COBOL on Tropo.com given our underlying architecture. We use the JSR 223 scripting framework and so adding support for a language with a JSR 223-compliant scripting engine is simply a matter of us writing a “shim” that lets that scripting engine run on top of Tropo. (Well, then there’s those, oh, wee little details of testing, documentation, etc….. 😉 So in theory if there was a COBOL JSR-223 scripting engine out there somewhere, it could be something we could add to Tropo. It’s the beauty of the platform and why we are looking to add more languages in the months ahead.
Anyway, thanks again to all who participated last week. We continue to enjoy seeing what you all are doing with building applications on Tropo.com. (And some of you are doing amazing things with the platform!)
This morning at VoiceCon Orlando, Voxeo CTO RJ Auburn was part of a panel on “voice mashups”. RJ spoke about the tools available for developers and specifically focused on what you can do with Tropo.com. RJ’s slides are now available at SlideShare and I will embed them here for your viewing: