HTTP/STT/ASR Protocol
Description
The Voximal VoiceXML browser can connect to a STT (SpeechToText) or ASR engines using HTTP. The HTTP protocol is used to recognize or transcribe an audio file to a text or word(s).
This protocol is simple :
- From the VoiceXML browser, you configure to use HTTP, a (POST only) request containing mainly the audio file content and additional parameters (like language, grammar, confidence level…).
- The web server with the STT/ASR engine treats your request.
- The VoiceXML browser receives an XML or JSON results : it converts to NLSML syntax and interpretes it.
Voximal configuration
The configuration is set in /etc/asterisk/voximal.conf, in the section “[recognizer]” :
- uri : You need to set the 'uri' for the TTS (or TextToVideo) service (our scripts install the services in http://ip/stt/provider/stt.php).
- api : Configure the HTTP specific API (microsoft, google, ibm/watson…).
- key : Configure the authentification key API if requested by the API.
Configuration example :
[recognize] api=microsoft key=c49db9de7db94d50b85c0cc8c46c2651
Most of this parameters can be change from the VoiceXML syntax using properties. Use the property name 'prompt' added with the parameter name.
VoiceXML example :
<property name="promptvoice" value="Poala"/>
HTTP Parameters
- [body] : the audio file : standard POST methode.
- language : the language used (en-GB, fr-FR…) : from the xml:lang attribut.
- uid : Voximal UID.