HTTP/STT/ASR Protocol

The Voximal VoiceXML browser can connect to a STT (SpeechToText) or ASR engines using HTTP. The HTTP protocol is used to recognize or transcribe an audio file to a text or word(s).

This protocol is simple :

  • From the VoiceXML browser, you configure to use HTTP, a (POST only) request containing mainly the audio file content and additional parameters (like language, grammar, confidence level…).
  • The web server with the STT/ASR engine treats your request.
  • The VoiceXML browser receives an XML or JSON results : it converts to NLSML syntax and interpretes it.

The configuration is set in /etc/asterisk/voximal.conf, in the section “[recognizer]” :

  • uri : You need to set the 'uri' for the TTS (or TextToVideo) service (our scripts install the services in http://ip/stt/provider/stt.php).
  • api : Configure the HTTP specific API (microsoft, google, ibm/watson…).
  • key : Configure the authentification key API if requested by the API.

Configuration example :

[recognize]
api=microsoft
key=c49db9de7db94d50b85c0cc8c46c2651

Most of this parameters can be change from the VoiceXML syntax using properties. Use the property name 'prompt' added with the parameter name.

VoiceXML example :

<property name="promptvoice" value="Poala"/>
  • [body] : the audio file : standard POST methode.
  • language : the language used (en-GB, fr-FR…) : from the xml:lang attribut.
  • uid : Voximal UID.