Server events
All messages sent to your Server URL are POST requests with this body shape:
Common metadata included on most events:
phoneNumber,timestampartifact(recording, transcript, messages, etc.)assistant,customer,call,chat
Most events are informational and do not require a response. Responses are only expected for these types sent to your Server URL:
- “assistant-request”
- “tool-calls”
- “transfer-destination-request”
- “knowledge-base-request”
Note: Some specialized messages like “voice-request” and “call.endpointing.request” are sent to their dedicated servers if configured (e.g. assistant.voice.server.url, assistant.startSpeakingPlan.smartEndpointingPlan.server.url).
Function Calling (Tools)
Vapi supports OpenAI-style tool/function calling. Assistants can ping your server to perform actions.
Example assistant configuration (excerpt):
When tools are triggered, your Server URL receives a tool-calls message:
Respond with results for each tool call:
Optionally include a message to speak to the user while or after running the tool.
If a tool does not need a response immediately, you can design it to be asynchronous.
Retrieving Assistants
For inbound phone calls, you can specify the assistant dynamically. If a PhoneNumber doesn’t have an assistantId, Vapi may request one from your server:
You must respond to the assistant-request webhook within 7.5 seconds end-to-end. This limit is fixed and not configurable: the telephony provider enforces a 15-second cap, and Vapi reserves ~7.5 seconds for call setup. The timeout value shown elsewhere in the dashboard does not apply to this webhook.
To avoid timeouts:
- Return quickly with an existing
assistantIdor a minimal assistant, then enrich context asynchronously after the call starts using Live Call Control. - Host your webhook close to
us-west-2to reduce latency, and target < ~6s to allow for network jitter.
Respond with either an existing assistant ID, a transient assistant, or transfer destination:
Transfer only (skip AI)
If you want to immediately transfer the call without using an assistant, return a destination in your assistant-request response. This bypasses AI handling.
When destination is present in the assistant-request response, the call forwards immediately and assistantId, assistant, squadId, and squad are ignored.
You must still respond within 7.5 seconds.
To transfer silently, set destination.message to an empty string.
For caller ID behavior, see Call features.
Or return an error message to be spoken to the caller:
Status Updates
scheduled: Call scheduled.queued: Call queued.ringing: The call is ringing.in-progress: The call has started.forwarding: The call is about to be forwarded.ended: The call has ended.
End of Call Report
Hang Notifications
Use this to surface delays or notify your team.
Conversation Updates
Sent when an update is committed to the conversation history.
Transcript
Partial and final transcripts from the transcriber.
For final-only events, you may receive type: "transcript[transcriptType=\"final\"]".
Speech Update
Assistant Speech Started
Sent as the assistant begins speaking each segment of a turn, synchronized to audio playback. Designed for live captions, karaoke-style word highlighting, and any UI that needs to track what’s being spoken in real time.
This event is opt-in. Add "assistant.speechStarted" to your assistant’s serverMessages and/or clientMessages to receive it.
timing.type: "word-alignment" — ElevenLabs
Per-word timestamps from ElevenLabs’ alignment API. Events arrive at audio playback cadence (~50–200ms apart). The words[] array includes space entries with real timing — join them and track a running character cursor to highlight text up to that position. No client-side interpolation needed.
timing.type: "word-progress" — Minimax (with voice.subtitleType: "word")
Cursor-based per-segment progress.
Minimax only attaches subtitle data to the final audio chunk of each synthesis segment, so each assistant.speechStarted event for a Minimax turn fires near the end of that segment’s audio playback — not at the start, and not per-word. The wordsSpoken value jumps in segment-sized increments, and the words[] array carries timestamps for the segment that just finished. Use it to retroactively animate that segment, or to extrapolate forward — but it cannot drive smooth real-time highlighting during the current segment. For true playback-cadence per-word events, use ElevenLabs.
totalWords: 0 is a valid sentinel on the very first event of a turn before Minimax confirms its word count — guard against divide-by-zero when computing a progress fraction. See the Minimax voice provider page for full configuration details.
No timing field — text-only fallback
All other providers (Cartesia, Deepgram, Azure, OpenAI, Inworld, etc.) emit text-only events with no timing object. One event per TTS chunk, gated to actual audio playback. Display text as a caption block, or interpolate a word cursor at a flat rate (~3.5 words/sec) between events for an approximate cursor.
Behaviors to be aware of
force-sayevents always emit as text-only, even on ElevenLabs and Minimax — there’s no provider-level alignment for forced utterances (firstMessage, queuedsayactions).- On user barge-in, no further events fire for the interrupted turn. Pair with the
user-interruptedmessage and use the most recentwordsSpoken(or joined char cursor) to know what was actually spoken. - There is no companion
assistant.speechStoppedevent. Usespeech-update(status: "stopped") or watchturnincrement to detect end-of-turn. - Custom voice timing depends on what your voice server returns. If you return timestamped JSON frames from your custom voice server, those flow through as
timing.words[]; raw PCM responses produce text-only events.
Model Output
Tokens or tool-call outputs as the model generates. The optional turnId groups all tokens from the same LLM response, so you can correlate output with a specific turn.
Transfer Destination Request
Requested when the model wants to transfer but the destination is not yet known and must be provided by your server.
This event is emitted only if the assistant did not supply a destination when calling a transferCall tool (for example, it did not include a custom parameter like phoneNumber). If the assistant includes the destination directly, Vapi will transfer immediately and will not send this webhook.
Respond with a destination and optionally a message:
Transfer Update
Fires whenever a transfer occurs.
User Interrupted
Sent when the user interrupts the assistant. The optional turnId identifies the LLM turn that was interrupted, matching the turnId on model-output messages so you can discard that turn’s tokens.
Language Change Detected
Sent when the transcriber switches based on detected language.
Phone Call Control (Advanced)
When requested in assistant.serverMessages, hangup and forwarding are delegated to your server.
Knowledge Base Request (Custom)
If using assistant.knowledgeBase.provider = "custom-knowledge-base".
Respond with documents (and optionally a custom message to speak):
Voice Input (Custom Voice Providers)
Voice Request (Custom Voice Server)
Sent to assistant.voice.server.url. Respond with raw 1-channel 16-bit PCM audio at the requested sample rate (not JSON).
Call Endpointing Request (Custom Endpointing Server)
Sent to assistant.startSpeakingPlan.smartEndpointingPlan.server.url.
Respond with the timeout before considering the user’s speech finished:
Chat Events
chat.created: Sent when a new chat is created.chat.deleted: Sent when a chat is deleted.
Session Events
session.created: Sent when a session is created.session.updated: Sent when a session is updated.session.deleted: Sent when a session is deleted.