Human vs. Answering Machine Detection
December 17th, 2010 by cmatthieu
Would you like to know if your outbound call was answered by a human or an answering machine? Voicemail systems are pervasive today. In fact, they seem to answer the phone more than people do.
In the telecom world, this simple request is called Call Progress Analysis (CPA). To perform CPA correctly like on the Voxeo Evolution enterprise hosting platform, it requires advanced Digital Signal Processing (DSP) and voice activity detection to analyze the audio signal after a call is connected, making it possible to programmatically determine if the answering party is a human speaker, an answering machine, a modem, or even a fax.
Fortunately, Tropo will be able to leverage Voxeo’s years of work in this field and incorporate this technology into our cloud platform soon. Until then, we have a trick to share with you! Most of the telephony services simply measure the length of time of the callee’s answer greeting. If it’s less than 3 seconds like “hello” then return human. If it’s greater than 3 seconds then return machine.
You could even go as far as evaluating the length of time less than 3 seconds to determine if the call is answered by a residence of a business establishment. Typically residences say “hello” which is less than 1.5 seconds while businesses say “How may we help you?” which is typically between 1.5 to 3 seconds.
We recently assisted several Tropo customers with detecting Human vs. Answering Machine so we wanted to share with you the simple Ruby script that we used to successfully make this determination.
call 'tel:+14805551212'
starttime = Time.new
record ".", {
:beep => false,
:timeout => 10,
:silenceTimeout => 1,
:maxTime => 10
}
endtime = Time.now
difference = endtime - starttime
if difference < 3
say "human"
else
say "answering machine"
end
This demo is written using Tropo’s Scripting API and is hosted on our cloud platform for you. It places a phone call when your application invokes our API URL with your token. On connect, it starts the timer. On answer, it records the callee’s greeting until 1 second of silence is detected. On silence, it stops the timer and subtracts the diference to determine the length of the greeting. If it’s less than 3 seconds, you can assume that a human answered the call; otherwise, assume that a real-live robot did!
Related posts:
- WebPulp.TV Interviews Tropo
- Send a Fax with your Voice!
- ftptail – Tail Tropo logs to your local machine
- The Conference Pop-In Timer
- How-To: Sending an SMS using WebAPI
Tags: answering machine, call progress analysis, cpa, dsp, human, Ruby


I have an additional question, if you want to leave a message, when do you start? Ie, how do we know when the answering machine is ready to record a message? is there a fixed amount of time (some rule of thumb) that we should wait for, before the voicemail message starts?
Manoj asked the right question… the issue is NOT detecting if it;s an answering machine, it’s HANDLING an answering machine… leaving a message at the right time. Kind of a basic requirement for any system that makes calls. Can that be done on Tropo?
Hi Manoj,
You can experiment with the maxtime and difference settings but once the detection has been made and the silenceTimeout has been detected, you can leave a message by using the Tropo Say command. Say allows you to both say something using text-to-speech in any of the languages that Tropo supports or you can play a recorded MP3 or WAV file by simply specifying the URL of the audio file in the Say command.
Regards, Chris
Is this possible using the WebAPI?
Same question as Russell – can this be done using the WebAPI?
The difference between the scripting API and the Web API may be relevant here – it appears that the :silenceTimeout option in the scripting API is equivalent to the :maxTimeout option in the WebAPI. Is that right?
Wes (and belatedly Russell … sorry about that, Russell!),
Yes, you could use an approach like this with the Tropo WebAPI because it’s all based on timing. I talked to a couple of the engineers about this and their only comment was that using the WebAPI you would have the additional latency of the HTTP calls to your server. The potential issue there is that a slow web server (or HTTP connection) might wind up causing the session to exceed the timing threshold and be determined to be a “machine”. So you might need to spend some time tweaking the timing a bit. (and depending upon the connection to your web server and its capacity there may always be a chance of this issue occurring)
Thanks for asking, Dan