Skip to content

Voice Channel

Human supports real-time voice interaction through the built-in voice channel, enabling hands-free conversations with your AI assistant.

The voice channel provides:

  • Speech-to-text (STT) — converts spoken audio to text for processing
  • Text-to-speech (TTS) — reads responses aloud
  • WebSocket streaming — low-latency bidirectional audio
  • VAD (Voice Activity Detection) — automatic speech boundary detection

Enable voice in your config.json:

{
"channels": {
"voice": {
"enabled": true,
"stt_provider": "whisper",
"tts_provider": "openai",
"vad_threshold": 0.5,
"sample_rate": 16000
}
}
}

Voice support is included by default. To explicitly enable:

Terminal window
cmake .. -DHU_ENABLE_VOICE=ON
ProviderSTTTTSNotes
OpenAI WhisperBest accuracy
OpenAI TTSMultiple voices
DeepgramLow latency
ElevenLabsPremium voices

Audio data is streamed directly to the configured provider and is not stored locally. All connections use TLS.