Tropo is part of CiscoLearn More

Advanced transcription and analytics with Voicebase and Tropo

Posted on June 24, 2015 by Adam Kalsey

Tropo includes US English speech-to-text transcription directly in the API. To transcribe any recording, simply set the transcriptionOutURI parameter on any recording and Tropo will transcribe the recording and send the transcription to that URL.

What if you want more capabilities? Other languages, tunable accuracy, human-powered transcription, or voice analytics, for example. For this, we can send your recording off to any transcription service that allows an audio upload. In this example, we’ll show how to integrate Tropo with Voicebase’s audio indexing and transcription API.

VoicebaseVoicebase provides an API that can transcribe in multiple languages, provides fast, accurate transcription for longer-form text, and can classify and analyze the resulting transcription. At the basic level, you can simply get a transcription back. The audio file and transcription are then stored in your Voicebase account for searching and detailed analytics. Mention Tropo when you sign up for Voicebase, and they’ll transcribe 200 hours of recordings for you for free.

A Tropo recording file gets sent to a URL of your choice. To send this to Voicebase, you’ll need a small application that receives the Tropo upload and then creates the Voicebase API call to send the recording for transcription. The application then waits for the transcription to be completed and does something with it, perhaps emailing it, storing in a database, or sending a text message.

The sample application described below uses Slim Framework, Tropo, and Voicebase and will run on any PHP web server. You can get the entire app in our voicebase-php repo on Github. It receives the Tropo recording and saves it to disk. It then asks for a basic machine transcription from Voicebase, and once Voicebase has finished transcribing the file, places the transcription in a text file with a filename that matches the audio file name. The application is less than 100 commented lines of code.

The building blocks

Tropo uploads the file to your web server using HTTP POST, sending the file in a field called “filename”. In PHP, you can access this file with the variable $_FILES['filename']. This file is going to be saved in a place the web server can serve it up later.

The Voicebase API uses the same URI for all API calls, and an “action” parameter in the API call tells Voicebase what API call is being made. All API calls are authenticated with an API key and the Voicebase account password, also both passed as API parameters. A version parameter indicates which VoiceBase API version is being accessed.

To upload a recording, a POST request will be used. The API name is uploadMedia. Instead of sending the recording file to Voicebase, the API call includes a URL where Voicebase will download the recording. And the Voicebase API allows an externalID to be set that later be used retrieve the transcription.

Transcription is asynchronous. The upload API call returns immediately, and the transcription is fetched with another API call when it is complete. The upload API allows a callback URL to be set, and when the transcription is complete, Voicebase will send a webhook request to this URL.

The webhook is a GET request with a series of query string parameters. The two that we’re interested in are state, which indicates whether or not the transcription worked, and externalId, which contains the unique ID we set for this recording.

To fetch the recording, a GET with query string parameters is sent. The API name is getTranscript. The version and authentication keys are included in the query string. The externalID is specified in the query string to tell Voicebase which transcription to retrieve. The format parameter indicates how you would like the transcription formatted. Voicebase supports a variety of output formats including plain text and even PDF.

The Tropo application

In your Tropo application, use one of the recording methods to make a recording and set the recordURI to http://your-server.com/path/to/application/recording/{uniqueid}, replacing your-server.com with the hostname of your server, path/to/application with the directory where you installed the application, and {uniqueid} with an ID that will be unique per call. This unique ID will be used as the Voicebase externalId and will also be used as the filenames for saving the recordings and transcriptions.

Generating a unique ID can be as simple as using the timestamp and caller’s phone number, like so:

<?php
$id = $currentCall->callerID . '-' . date('Y-m-d-His');

record('Leave your message',
    array(
        'recordURI' => 'http://your-server.com/path/to/application/recording/' . $id
        )
    );
?>

Application Walkthrough

In the Slim application on your web server, the function that accepts the recording upload from Tropo and then sends it to the Voicebase API looks like this:

$app->post('/recording/:id', function($id) use($app) {
    $dir = getcwd();
    move_uploaded_file($_FILES['filename']['tmp_name'], "$dir/audio/$id.wav");
    $app->log->debug("SAVE $id / $dir/audio/$id.wav");

    $params = array(
        "version" => "1.1",
        "apikey" => $app->config('apikey'),
        "password" => $app->config('password'),
        "action" => "uploadMedia",
        "transcriptType" => "machine-best",
        "mediaURL" => $app->request()->getUrl() . $app->request()->getScriptName() . "/audio/$id.wav",
        "machineReadyCallBack" => $app->request()->getUrl() . $app->request()->getScriptName() . "/transcription",
        "speakerChannelFlag" =>  'true',
        "speakerNames" => 'speaker-1,speaker-2',
        "externalID" => $id
    );
    $response = Requests::post("{$app->config('endpoint')}", array(), $params);
    $app->log->debug("UPLOAD $id / " . $response->body);
    if ('SUCCESS' != json_decode($response->body)->requestStatus) {
        $app->log->error("UPLOAD $id / " . json_decode($response->body)->statusMessage);
    }
});

Lines 36-38 accept the Tropo recording as a form post upload and saves it to the audio directory, using the unique ID created by the Tropo script.

Lines 39-50 are creating the data that will be sent to the Voicebase API. Lines 41 and 42 set your Voicebase account credentials from the configuration file. Line 43 instructs Voicebase that we’re using their uploadMedia method to send a recording, and line 45 gives the URL on your web server where Voicebase can download the recording from. Line 46 sets a webhook that Voicebase will hit when the transcription is complete, and line 49 sets an ID that we can use to locate the transcription using Voicebase’s API.

Line 44 sets the transcription to use “machine-best” to get the highest quality transcription available. This is only available for US English recordings, so if another language is used, you should omit this line. The next version of the Voicebase API will default all transcriptions to the highest available quality.

Lines 47 and 48 tell Voicebase that we’d like to separate different speakers into different transcriptions. Tropo records all multi-party calls in stereo, with one channel being dedicated to the first leg of the call and the other channel containing the other speakers in the call. For a transfered call, this results in both people on the call being recorded in different channels. For a conference call, this allows you to record each caller in the conference call and in each recording, one channel will contain only that caller. Voicebase will transcribe each channel separately and will tag each line of dialog with the speaker names you specify.

Line 51 makes the HTTP POST to the Voicebase API.

When Voicebase has completed the transcription of the file, they will make a request to the webhook callback we supplied with our recording upload API call. The code to handle that webhook request, fetch the transcription, and save it is below:

$app->post('/transcription', function() use($app)  {
    $req = $app->request();
    $state = $req->params('state');
    $id = $req->params('externalId');
    $app->log->debug("CALLBACK $id / " . json_encode($req->params()));

    if ($state != 'MACHINEREADY') {
        $app->log->error("TRANSCRIBE $id / Error in callback: $state. " . json_encode($req->params()));
    } else {
        $params = array(
            "version" => "1.1",
            "apikey" => $app->config('apikey'),
            "password" => $app->config('password'),
            "action" => "getTranscript",
            "format" => "TXT",
            "externalId" => $id
        );
        $qs = '';
        foreach ($params as $k => $v) {
            $k = urlencode($k);
            $v = urlencode($v);
            $qs .= "$k=$v&";
        }
        $app->log->debug("REQ TRANSCRIPT $id / {$app->config('endpoint')}?$qs");

        $response = Requests::get("{$app->config('endpoint')}?$qs", array(), $params);
        $app->log->debug("TRANSCRIPT $id / " . $response->body);

        $transcript = json_decode($response->body)->transcript;
        $dir = getcwd();

        $file = fopen("$dir/audio/$id.txt","w");
        fwrite($file,$transcript);
        fclose($file);
        $app->log->info("transcribed $id / $transcript");
    }
});

The webhook callback that we get from Voicebase has a couple of query string parameters in it. A state parameter tells us what the status of the transcription is. Did it succeed? This will always show that it was a success, since the only callback we asked for when setting up the API on line 46 was for successful transcriptions. A externalId parameter contains the unique ID that we sent with the API call on line 49. These query string parameters are extracted in lines 59-61.

Lines 67-74 set up the Voicebase API call that will get the transcription. Lines 69 and 70 provide your Voicebase credentials. Line 71 says we want to use the getTranscription method. Line 72 specifies that we want the transcription as plain text. Line 73 is our unique ID for the transcription so that Voicebase returns the right one.

The transcription format asked for here is plain text. Voicebase also supports other formats, including a JSON body that includes a timestamp for every transcribed word. For simplicity’s sake, we want a plain text result, so the Voicebase API JSON response contain the transcription as text in a single JSON property.

Lines 76-80 convert the data hash from lines 67-74 into a query string. Line 83 makes the API request to Voicebase and line 86 extracts the transcript from the API response. Lines 87-91 save the transcript as a text file using the same filename format as the recording was saved in.

Installing

First, clone or fork our voicebase-php repo on Github.

To test this application out, you’ll need a webserver running PHP. You’ll also need to get an API account from Voicebase. Mention Tropo when asking for your API key and Voicebase will give you 200 hours of transcriptions for free.

Rename sample.config.json to config.json and edit to add your Voicebase API key and password. Copy config.json. .htaccess and index.php to your web server. On your web server, use Composer to install the dependancies Slim Framework and Requests. Create a directory called audio in the same location as index.php and make sure it is writable by the web server.

Leave a Reply