WebSocket Transport
Vapi’s WebSocket transport enables real-time, bidirectional audio communication directly between your application and Vapi’s AI assistants. Unlike traditional phone or web calls, this transport method lets you stream raw audio data instantly with minimal latency.
Key Benefits
- Low Latency: Direct streaming ensures minimal delays.
- Bidirectional Streaming: Real-time audio flow in both directions.
- Easy Integration: Compatible with any environment supporting WebSockets.
- Flexible Audio Formats: Customize audio parameters such as sample rate.
- Automatic Sample Rate Conversion: Seamlessly handles various audio rates.
Creating a WebSocket Call
To initiate a call using WebSocket transport:
PCM Format (16-bit, default)
Mu-Law Format
Sample API Response
Audio Format Configuration
When creating a WebSocket call, the audio format can be customized:
Supported Audio Formats
Vapi supports the following audio formats:
pcm_s16le
: 16-bit PCM, signed little-endian (default)mulaw
: Mu-Law encoded audio (ITU-T G.711 standard)
Both formats use the raw
container format for direct audio streaming.
Format Selection Guidelines
- PCM (
pcm_s16le
): Higher quality audio, larger bandwidth usage. Ideal for high-quality applications. - Mu-Law (
mulaw
): Lower bandwidth, telephony-standard encoding. Ideal for telephony integrations and bandwidth-constrained environments.
Vapi automatically converts sample rates as needed. You can stream audio at 8kHz, 44.1kHz, etc., and Vapi will handle conversions seamlessly. The system also handles format conversions internally when needed.
Connecting to the WebSocket
Use the WebSocket URL from the response to establish a connection:
Sending and Receiving Data
The WebSocket supports two types of messages:
- Binary audio data (format depends on your configuration: PCM or Mu-Law)
- Text-based JSON control messages
Audio Data Format
The binary audio data format depends on your audioFormat
configuration:
- PCM (
pcm_s16le
): 16-bit signed little-endian samples - Mu-Law (
mulaw
): 8-bit Mu-Law encoded samples (ITU-T G.711)
Sending Audio Data
Receiving Data
Sending Control Messages
Ending the Call
The recommended way to end a call is using Live Call Control which provides more control and proper cleanup.
Alternatively, you can end the WebSocket call directly:
Comparison: WebSocket Transport vs. Call Listen Feature
Vapi provides two WebSocket options:
Refer to Live Call Control for more on the Call Listen feature.
When using WebSocket transport, phone-based parameters (phoneNumber
or phoneNumberId
) are not permitted. These methods are mutually exclusive.