HTTP/STT/ASR Protocol

Description

The Voximal VoiceXML browser can connect to a STT (SpeechToText) or ASR engines using HTTP. The HTTP protocol is used to recognize or transcribe an audio file to a text or word(s).

This protocol is simple :

From the VoiceXML browser, you configure to use HTTP, a (POST only) request containing mainly the audio file content and additional parameters (like language, grammar, confidence level…).
The web server with the STT/ASR engine treats your request.
The VoiceXML browser receives an XML or JSON results : it converts to NLSML syntax and interpretes it.

The configuration is set in /etc/asterisk/voximal.conf, in the section “[recognizer]” :

uri : You need to set the 'uri' for the TTS (or TextToVideo) service (our scripts install the services in http://ip/stt/provider/stt.php).
api : Configure the HTTP specific API (microsoft, google, ibm/watson…).
key : Configure the authentification key API if requested by the API.

Configuration example :

[recognize]
api=microsoft
key=c49db9de7db94d50b85c0cc8c46c2651

Most of this parameters can be change from the VoiceXML syntax using properties. Use the property name 'prompt' added with the parameter name.

VoiceXML example :

<property name="promptvoice" value="Poala"/>

[body] : the audio file : standard POST methode.
language : the language used (en-GB, fr-FR…) : from the xml:lang attribut.
uid : Voximal UID.

HTTP/STT/ASR Protocol

Description

Voximal configuration

HTTP Parameters

Voximal documentation