Vapi’s WebSocket transport enables real-time, bidirectional audio communication directly between your application and Vapi’s AI assistants. Unlike traditional phone or web calls, this transport method lets you stream raw audio data instantly with minimal latency.
To initiate a call using WebSocket transport:
When creating a WebSocket call, the audio format can be customized:
Vapi supports the following audio formats:
pcm_s16le: 16-bit PCM, signed little-endian (default)mulaw: Mu-Law encoded audio (ITU-T G.711 standard)Both formats use the raw container format for direct audio streaming.
pcm_s16le): Higher quality audio, larger bandwidth usage. Ideal for high-quality applications.mulaw): Lower bandwidth, telephony-standard encoding. Ideal for telephony integrations and bandwidth-constrained environments.Vapi automatically converts sample rates as needed. You can stream audio at 8kHz, 44.1kHz, etc., and Vapi will handle conversions seamlessly. The system also handles format conversions internally when needed.
Use the WebSocket URL from the response to establish a connection:
The WebSocket supports two types of messages:
The binary audio data format depends on your audioFormat configuration:
pcm_s16le): 16-bit signed little-endian samplesmulaw): 8-bit Mu-Law encoded samples (ITU-T G.711)The recommended way to end a call is using Live Call Control which provides more control and proper cleanup.
Alternatively, you can end the WebSocket call directly:
Vapi provides two WebSocket options:
Refer to Live Call Control for more on the Call Listen feature.
When using WebSocket transport, phone-based parameters (phoneNumber or phoneNumberId) are not permitted. These methods are mutually exclusive.