Introduction

In a past life, I used to work for an ISP. It was my first development role and I learnt a lot there. Looking back on that time - 11 years since I started - a lot has changed, and a lot of things have gotten a lot simpler.

Part of the signup process there involved having customers complete a voice contract system. This involved filling out a form online, then taking a customer identifier, calling a phone number, saying your name, and answering a bunch of questions. Through the process of completing the voice contract system, you confirmed aspects of the service you signed up for and verbally signed your name to the agreement of the service terms.

This system involved two developers internally, a business analyst, and an external company which managed the hosting, development, and integration of the IVR system. Weeks of planning, development, and large complicated flow diagrams.

Would it surprise you to know that just the other week I managed to throw a simple Proof of Concept of a similar system together in less than one hour?

Enter Nexmo

Nexmo is a company providing building blocks for next-generation communication applications. They've been around the block before, and had an amusing live demo at Laracon US 2016.

You may have used their SMS services in the past, but one of their other offerings is a programmable Voice API that lets your callers interact with a pretty impressive text-to-speech engine with their voice and a phone.

What I wanted to play with was replicating that original voice contract system.

Setting up Nexmo for the Voice API

You'll first need to check out the Nexmo docs on how to setup an account and a voice application, saving the private.key file in your application's base directory.

Next, pull in the PHP SDK via the Laravel wrapper. We're going to need to override the default Nexmo service provider when using the Voice API, so add it to your composer.json's dont-discover array as nexmo/laravel, then run composer require nexmo/laravel then create config/nexmo.php:

// config/nexmo.php
<?php

return [
    'api_key' => env('NEXMO_KEY'),
    'api_secret' => env('NEXMO_SECRET'),
    'application_id' => env('NEXMO_APPLICATION_ID'),
    'key_path' => base_path('private.key'),
];

In your AppServiceProvider.php file, we'll bind the Nexmo client to the container as the default implementation doesn't take into account the fact we need both the private key file and the application ID:

public function register()
{
    $this->app->singleton(Nexmo\Client::class, function($app) {
        $config = $app['config']->get('nexmo');

        return new Nexmo\Client(
            new Nexmo\Client\Credentials\Container(
                new Nexmo\Client\Credentials\Basic($config['api_key'], $config['api_secret']),
                new Nexmo\Client\Credentials\Keypair(file_get_contents($config['key_path']), $config['application_id'])
            )
        );
    });
}

We'll need to set up two endpoints to handle the events sent from Nexmo to our application. Do note that these endpoints need to be accessible as Nexmo will send web hooks their way. The easiest way to get this working while you develop (if you're on a Mac) is with Laravel Valet, which exposes your development environment over an ngrok tunnel.

Route::get('/calls/answer', function () {

});

Route::post('/calls/event', function () {

});

We'll need to disable the CSRF middleware on these endpoints. We can just add /calls/* to the $except array in the VerifyCsrfToken middleware located at App\Http\Middleware\VerifyCsrfToken.php.

Lastly, setup the route which will initiate the voice call.

// Right in routes/web.php

Route::get('/call', function (Nexmo\Client $client) {
    return $client->calls()->create([
        'to' => [
            [
                'type' => 'phone',
                'number' => '11111111111',
            ],
        ],
        'from' => [
            'type' => 'phone',
            'number' => '11111111111', // Not displayed
        ],
        'answer_url' => [url('/api/calls/answer')],
        'event_url' => [url('/api/calls/event')],
    ]);
});

Returning the created call will let us see some information from the Nexmo about the call, which will let us later track it if necessary.

{
  "uuid": "f6c0d19c-fbd0-418b-8792-420890b9d978",
  "status": "started",
  "direction": "outbound",
  "conversation_uuid": "CON-61cb01e3-fc9b-49ec-820c-f9a9d752726d"
}

Right now, you'll be receiving your first API-initiated voice call! It'll be a bit boring, however, as we haven't setup our Nexmo Call Control Object (NCCO). NCCOs let us define what actions should take place once the call is answered. This includes what the text-to-speech API says to the caller on answer, whether we record the conversation (or part thereof), as well as any conversation steps.

What do I say when somebody answers?

Inside our /calls/answer route definition, we'll return a JSON response that will be used when the call is answered:

Route::get('/calls/answer', function () {
    return response()->json([
        [
            'action' => 'talk',
            'voiceName' => 'Russell',
            'text' => 'Hi, this is Russell. I am the default Australian, male voice',
        ],
    ]);
});

This control object is basically telling the Voice API that when the call is answered by the remote party, you should speak with Russell's voice and say "Hi, this is Russel. I am the default Australian, male voice".

So far, so good, but it's not terribly useful.

Recording conversations

The first thing we want to do is record the name of the person we have called.

Route::get('/calls/answer', function () {
    return response()->json([
        [
            'action' => 'talk',
            'voiceName' => 'Russell',
            'text' => 'Hi, this is Russell. Please say your name, followed by the hash key',
        ],
        [
            'action' => 'record',
            'eventUrl' => [url('/calls/event?action=record&type=name')],
            'endOnKey' => '#',
        ],
    ]);
});

We'll also add some logging so that we can see the recording come in once it's completed.

Route::post('/calls/event', function () {
    app('log')->debug('Incoming call event', request()->all());
});

Placing the call again, we'll receive the event once the recording is received:

{
  "start_time": "2017-09-24T13:55:48Z",
  "recording_url": "https://api.nexmo.com/v1/files/3c12f079-8a3f-4920-959a-8abbbd577fcd",
  "size": 18126,
  "recording_uuid": "4009df21-b667-405f-a07c-fc48be840ba3",
  "end_time": "2017-09-24T13:55:53Z",
  "conversation_uuid": "CON-896f0949-c6ec-4f44-b8ed-2906deb4c2bc",
  "action": "record",
  "type": "name"
}

It is my understanding that the Nexmo PHP SDK doesn't expose functionality to retrieve the recording files at the time of writing, but the documentation covers this process.

Update 2017-09-29: Since v1.1.0, the SDK provides an example for fetching recording files using the Nexmo client. Thanks to Michael Heap for adding this.

Prompting for user input

Once we have the new customer's name (to verify against their signup), we can now ask them to agree to our terms and conditions.

Route::get('/calls/answer', function () {
    return response()->json([
        // ...
        [
            'action' => 'talk',
            'voiceName' => 'Russell',
            'text' => 'To confirm you have read and agree to our terms and conditions, please press one',
        ],
        [
            'action' => 'input',
            'eventUrl' => [url('/calls/event?action=input&type=terms')],
            'maxDigits' => 1,
        ]
    ]);
});

This is where things can start to get tricky, depending on how much information we need from our users, what paths you need to follow on certain key presses, etc.

Route::post('/calls/event', function () {
    app('log')->debug('Incoming call event', request()->all());

    if (request('action') == 'input') {
        if (request('type') == 'terms') {
            if (request('dtmf') == 1) {
                return response()->json([
                    [
                        'action' => 'talk',
                        'voiceName' => 'Russell',
                        'text' => 'You have agreed to our terms and conditions. Thank you!',
                    ],
                ]);
            }

            return response()->json([
                [
                    'action' => 'talk',
                    'voiceName' => 'Russell',
                    'text' => 'You did not agree to our terms and conditions. Good bye!',
                ],
            ]);
        }
    }
});

We can check the action and type matches our conditions, and then check the user input using the value of the request's dtmf parameter.

Conclusion

This is quite a naive example, but served as a quick and easy proof of concept. It reveals a great deal about how simple it is to get an interactive call recording solution up and running using Nexmo.

I have only explored getting up and running, having the Voice API speak to the people that it calls, recording their voice, and accepting input from them.

In a fully fleshed out IVR system, you would likely have many different paths the callee could follow - prompting them again if they didn't agree to the terms and conditions, and would certainly record when they had accepted them in the database against the signup record.

Looking back on the amount of time and effort that went in to getting a similar solution up and running just ten years ago, I'm thankful for companies like Nexmo that empower us with technology like this, allowing developers to bring solutions to client problems much faster.

If you'd like to learn more about the Nexmo Voice API, check out their documentation and if you wind up building something or need a hand along the way, I'd love to hear from you. Hit me up on Twitter!