Server events

Learn about different events that can be sent to a Server URL.

All messages sent to your Server URL are POST requests with this body shape:

1{
2 "message": {
3 "type": "<server-message-type>",
4 "call": { /* Call Object */ },
5 /* other fields depending on type */
6 }
7}

Common metadata included on most events:

  • phoneNumber, timestamp
  • artifact (recording, transcript, messages, etc.)
  • assistant, customer, call, chat

Most events are informational and do not require a response. Responses are only expected for these types sent to your Server URL:

  • “assistant-request”
  • “tool-calls”
  • “transfer-destination-request”
  • “knowledge-base-request”

Note: Some specialized messages like “voice-request” and “call.endpointing.request” are sent to their dedicated servers if configured (e.g. assistant.voice.server.url, assistant.startSpeakingPlan.smartEndpointingPlan.server.url).

Function Calling (Tools)

Vapi supports OpenAI-style tool/function calling. Assistants can ping your server to perform actions.

Example assistant configuration (excerpt):

1{
2 "model": {
3 "provider": "openai",
4 "model": "gpt-4o",
5 "functions": [
6 {
7 "name": "sendEmail",
8 "description": "Used to send an email to a client.",
9 "parameters": {
10 "type": "object",
11 "properties": {
12 "emailAddress": { "type": "string" },
13 "message": { "type": "string" }
14 },
15 "required": ["emailAddress", "message"]
16 }
17 }
18 ]
19 }
20}

When tools are triggered, your Server URL receives a tool-calls message:

1{
2 "message": {
3 "type": "tool-calls",
4 "call": { /* Call Object */ },
5 "toolWithToolCallList": [
6 {
7 "name": "sendEmail",
8 "toolCall": { "id": "abc123", "parameters": { "emailAddress": "john@example.com", "message": "Hi!" } }
9 }
10 ],
11 "toolCallList": [
12 { "id": "abc123", "name": "sendEmail", "parameters": { "emailAddress": "john@example.com", "message": "Hi!" } }
13 ]
14 }
15}

Respond with results for each tool call:

1{
2 "results": [
3 {
4 "name": "sendEmail",
5 "toolCallId": "abc123",
6 "result": "{ \"status\": \"sent\" }"
7 }
8 ]
9}

Optionally include a message to speak to the user while or after running the tool.

If a tool does not need a response immediately, you can design it to be asynchronous.

Retrieving Assistants

For inbound phone calls, you can specify the assistant dynamically. If a PhoneNumber doesn’t have an assistantId, Vapi may request one from your server:

1{
2 "message": {
3 "type": "assistant-request",
4 "call": { /* Call Object */ }
5 }
6}

You must respond to the assistant-request webhook within 7.5 seconds end-to-end. This limit is fixed and not configurable: the telephony provider enforces a 15-second cap, and Vapi reserves ~7.5 seconds for call setup. The timeout value shown elsewhere in the dashboard does not apply to this webhook.

To avoid timeouts:

  • Return quickly with an existing assistantId or a minimal assistant, then enrich context asynchronously after the call starts using Live Call Control.
  • Host your webhook close to us-west-2 to reduce latency, and target < ~6s to allow for network jitter.

Respond with either an existing assistant ID, a transient assistant, or transfer destination:

1{ "assistantId": "your-saved-assistant-id" }
1{
2 "assistant": {
3 "firstMessage": "Hey Ryan, how are you?",
4 "model": {
5 "provider": "openai",
6 "model": "gpt-4o",
7 "messages": [
8 { "role": "system", "content": "You're Ryan's assistant..." }
9 ]
10 }
11 }
12}
1{ "destination": { "type": "number", "number": "+11234567890" } }

Transfer only (skip AI)

If you want to immediately transfer the call without using an assistant, return a destination in your assistant-request response. This bypasses AI handling.

1{
2 "destination": {
3 "type": "number",
4 "number": "+14155552671",
5 "callerId": "{{phoneNumber.number}}",
6 "extension": "101",
7 "message": "Connecting you to support."
8 }
9}
1{
2 "destination": {
3 "type": "sip",
4 "sipUri": "sip:support@example.com",
5 "sipHeaders": { "X-Account": "gold" },
6 "message": "Transferring you now."
7 }
8}

When destination is present in the assistant-request response, the call forwards immediately and assistantId, assistant, squadId, and squad are ignored. You must still respond within 7.5 seconds. To transfer silently, set destination.message to an empty string. For caller ID behavior, see Call features.

Or return an error message to be spoken to the caller:

1{ "error": "Sorry, not enough credits on your account, please refill." }

Status Updates

1{
2 "message": {
3 "type": "status-update",
4 "call": { /* Call Object */ },
5 "status": "ended"
6 }
7}
Status Events
  • scheduled: Call scheduled.
  • queued: Call queued.
  • ringing: The call is ringing.
  • in-progress: The call has started.
  • forwarding: The call is about to be forwarded.
  • ended: The call has ended.

End of Call Report

1{
2 "message": {
3 "type": "end-of-call-report",
4 "endedReason": "hangup",
5 "call": { /* Call Object */ },
6 "artifact": {
7 "recording": { /* Recording object with URLs */ },
8 "transcript": "AI: How can I help? User: What's the weather? ...",
9 "messages": [
10 { "role": "assistant", "message": "How can I help?" },
11 { "role": "user", "message": "What's the weather?" }
12 ]
13 }
14 }
15}

Hang Notifications

1{
2 "message": {
3 "type": "hang",
4 "call": { /* Call Object */ }
5 }
6}

Use this to surface delays or notify your team.

Conversation Updates

Sent when an update is committed to the conversation history.

1{
2 "message": {
3 "type": "conversation-update",
4 "messages": [ /* current conversation messages */ ],
5 "messagesOpenAIFormatted": [ /* openai-formatted messages */ ]
6 }
7}

Transcript

Partial and final transcripts from the transcriber.

1{
2 "message": {
3 "type": "transcript",
4 "role": "user",
5 "transcriptType": "partial",
6 "transcript": "I'd like to book...",
7 "isFiltered": false,
8 "detectedThreats": [],
9 "originalTranscript": "I'd like to book..."
10 }
11}

For final-only events, you may receive type: "transcript[transcriptType=\"final\"]".

Speech Update

1{
2 "message": {
3 "type": "speech-update",
4 "status": "started",
5 "role": "assistant",
6 "turn": 2
7 }
8}

Assistant Speech Started

Sent as the assistant begins speaking each segment of a turn, synchronized to audio playback. Designed for live captions, karaoke-style word highlighting, and any UI that needs to track what’s being spoken in real time.

This event is opt-in. Add "assistant.speechStarted" to your assistant’s serverMessages and/or clientMessages to receive it.

1{
2 "message": {
3 "type": "assistant.speechStarted",
4 "text": "Hello world, how can I help you today?",
5 "turn": 2,
6 "source": "model",
7 "timing": {
8 /* optional — shape depends on voice provider, see below */
9 }
10 }
11}
FieldDescription
textFull assistant text for the current turn. Not a delta — accumulates across events in the same turn.
turn0-indexed turn number. Multiple events within the same turn share the same turn.
source"model" (LLM-generated), "force-say" (firstMessage / queued say actions), or "custom-voice".
timingOptional. Present when the voice provider supports word-level timing. Shape depends on timing.type.

timing.type: "word-alignment" — ElevenLabs

1{
2 "type": "word-alignment",
3 "words": ["Hello", " ", "world"],
4 "wordsStartTimesMs": [0, 320, 360],
5 "wordsEndTimesMs": [310, 350, 720]
6}

Per-word timestamps from ElevenLabs’ alignment API. Events arrive at audio playback cadence (~50–200ms apart). The words[] array includes space entries with real timing — join them and track a running character cursor to highlight text up to that position. No client-side interpolation needed.

timing.type: "word-progress" — Minimax (with voice.subtitleType: "word")

1{
2 "type": "word-progress",
3 "wordsSpoken": 22,
4 "totalWords": 45,
5 "segment": "the latest spoken segment text",
6 "segmentDurationMs": 3200,
7 "words": [
8 { "word": "the", "startMs": 0, "endMs": 110 },
9 { "word": "latest", "startMs": 110, "endMs": 480 }
10 ]
11}

Cursor-based per-segment progress.

Minimax only attaches subtitle data to the final audio chunk of each synthesis segment, so each assistant.speechStarted event for a Minimax turn fires near the end of that segment’s audio playback — not at the start, and not per-word. The wordsSpoken value jumps in segment-sized increments, and the words[] array carries timestamps for the segment that just finished. Use it to retroactively animate that segment, or to extrapolate forward — but it cannot drive smooth real-time highlighting during the current segment. For true playback-cadence per-word events, use ElevenLabs.

totalWords: 0 is a valid sentinel on the very first event of a turn before Minimax confirms its word count — guard against divide-by-zero when computing a progress fraction. See the Minimax voice provider page for full configuration details.

No timing field — text-only fallback

All other providers (Cartesia, Deepgram, Azure, OpenAI, Inworld, etc.) emit text-only events with no timing object. One event per TTS chunk, gated to actual audio playback. Display text as a caption block, or interpolate a word cursor at a flat rate (~3.5 words/sec) between events for an approximate cursor.

Behaviors to be aware of

  • force-say events always emit as text-only, even on ElevenLabs and Minimax — there’s no provider-level alignment for forced utterances (firstMessage, queued say actions).
  • On user barge-in, no further events fire for the interrupted turn. Pair with the user-interrupted message and use the most recent wordsSpoken (or joined char cursor) to know what was actually spoken.
  • There is no companion assistant.speechStopped event. Use speech-update (status: "stopped") or watch turn increment to detect end-of-turn.
  • Custom voice timing depends on what your voice server returns. If you return timestamped JSON frames from your custom voice server, those flow through as timing.words[]; raw PCM responses produce text-only events.

Model Output

Tokens or tool-call outputs as the model generates. The optional turnId groups all tokens from the same LLM response, so you can correlate output with a specific turn.

1{
2 "message": {
3 "type": "model-output",
4 "output": { /* token or tool call */ },
5 "turnId": "abc-123"
6 }
7}

Transfer Destination Request

Requested when the model wants to transfer but the destination is not yet known and must be provided by your server.

1{
2 "message": {
3 "type": "transfer-destination-request",
4 "call": { /* Call Object */ }
5 }
6}

This event is emitted only if the assistant did not supply a destination when calling a transferCall tool (for example, it did not include a custom parameter like phoneNumber). If the assistant includes the destination directly, Vapi will transfer immediately and will not send this webhook.

Respond with a destination and optionally a message:

1{
2 "destination": { "type": "number", "number": "+11234567890" },
3 "message": { "type": "request-start", "message": "Transferring you now" }
4}

Transfer Update

Fires whenever a transfer occurs.

1{
2 "message": {
3 "type": "transfer-update",
4 "destination": { /* assistant | number | sip */ }
5 }
6}

User Interrupted

Sent when the user interrupts the assistant. The optional turnId identifies the LLM turn that was interrupted, matching the turnId on model-output messages so you can discard that turn’s tokens.

1{
2 "message": {
3 "type": "user-interrupted",
4 "turnId": "abc-123"
5 }
6}

Language Change Detected

Sent when the transcriber switches based on detected language.

1{
2 "message": {
3 "type": "language-change-detected",
4 "language": "es"
5 }
6}

Phone Call Control (Advanced)

When requested in assistant.serverMessages, hangup and forwarding are delegated to your server.

1{
2 "message": {
3 "type": "phone-call-control",
4 "request": "forward",
5 "destination": { "type": "sip", "sipUri": "sip:agent@example.com" }
6 }
7}
1{
2 "message": {
3 "type": "phone-call-control",
4 "request": "hang-up"
5 }
6}

Knowledge Base Request (Custom)

If using assistant.knowledgeBase.provider = "custom-knowledge-base".

1{
2 "message": {
3 "type": "knowledge-base-request",
4 "messages": [ /* conversation so far */ ],
5 "messagesOpenAIFormatted": [ /* openai-formatted messages */ ]
6 }
7}

Respond with documents (and optionally a custom message to speak):

1{
2 "documents": [
3 { "content": "Return policy is 30 days...", "similarity": 0.92, "uuid": "doc-1" }
4 ]
5}

Voice Input (Custom Voice Providers)

1{
2 "message": {
3 "type": "voice-input",
4 "input": "Hello, world!"
5 }
6}

Voice Request (Custom Voice Server)

Sent to assistant.voice.server.url. Respond with raw 1-channel 16-bit PCM audio at the requested sample rate (not JSON).

1{
2 "message": {
3 "type": "voice-request",
4 "text": "Hello, world!",
5 "sampleRate": 24000
6 }
7}

Call Endpointing Request (Custom Endpointing Server)

Sent to assistant.startSpeakingPlan.smartEndpointingPlan.server.url.

1{
2 "message": {
3 "type": "call.endpointing.request",
4 "messagesOpenAIFormatted": [ /* openai-formatted messages */ ]
5 }
6}

Respond with the timeout before considering the user’s speech finished:

1{ "timeoutSeconds": 0.5 }

Chat Events

  • chat.created: Sent when a new chat is created.
  • chat.deleted: Sent when a chat is deleted.
1{ "message": { "type": "chat.created", "chat": { /* Chat */ } } }

Session Events

  • session.created: Sent when a session is created.
  • session.updated: Sent when a session is updated.
  • session.deleted: Sent when a session is deleted.
1{ "message": { "type": "session.created", "session": { /* Session */ } } }