Voice Channel
Human supports real-time voice interaction through the built-in voice channel, enabling hands-free conversations with your AI assistant.
Overview
Section titled “Overview”The voice channel provides:
- Speech-to-text (STT) — converts spoken audio to text for processing
- Text-to-speech (TTS) — reads responses aloud
- WebSocket streaming — low-latency bidirectional audio
- VAD (Voice Activity Detection) — automatic speech boundary detection
Configuration
Section titled “Configuration”Enable voice in your config.json:
{ "channels": { "voice": { "enabled": true, "stt_provider": "whisper", "tts_provider": "openai", "vad_threshold": 0.5, "sample_rate": 16000 } }}Build Flag
Section titled “Build Flag”Voice support is included by default. To explicitly enable:
cmake .. -DHU_ENABLE_VOICE=ONSupported Providers
Section titled “Supported Providers”| Provider | STT | TTS | Notes |
|---|---|---|---|
| OpenAI Whisper | ✓ | — | Best accuracy |
| OpenAI TTS | — | ✓ | Multiple voices |
| Deepgram | ✓ | — | Low latency |
| ElevenLabs | — | ✓ | Premium voices |
Security
Section titled “Security”Audio data is streamed directly to the configured provider and is not stored locally. All connections use TLS.