The handoff tool enables seamless call transfers between assistants in a multi-agent system. This guide covers all configuration patterns, destination types, context management, and advanced features.
The handoff tool transfers calls between assistants during a conversation. You can:
When using the handoff tool, add this to your system prompt for optimal agent coordination (adapted from the OpenAI Agents Handoff Prompt):
Each assistant destination also supports assistantOverrides to override settings on the destination assistant, and an inline assistant property to create a transient assistant without saving it first. See the API reference for all available properties.
Best for OpenAI models — creates separate tool definitions for each destination:
Best for Anthropic models — single tool with multiple destination options:
The destination is determined at runtime via the handoff-destination-request webhook:
Your server must respond with a single destination. You can return an assistantId, assistantName (if using squads), or a transient assistant. For example:
If the handoff should not execute, either respond with an empty destination, or provide a custom error. The custom error is added to the message history.
Pass additional context to your webhook for intelligent routing:
In addition to assistant and dynamic destinations, you can hand off a call to an entire squad. This transfers the caller into a new multi-agent system where the squad’s own routing logic takes over.
Reference a saved squad by its ID:
Define the squad inline without saving it first:
For the full schema, see the API reference.
Control what conversation history transfers to the next assistant or squad. Set contextEngineeringPlan on any destination.
Transfers the entire conversation history:
Transfers only the most recent N messages. Use this to limit context size for performance:
Transfers only user and assistant messages, filtering out system messages, tool calls, and tool results. This gives the next assistant a clean view of the conversation without internal implementation details:
Use userAndAssistantMessages when the destination assistant does not need to see tool call history or system prompts from the previous assistant. This produces a cleaner context and reduces token usage.
Transfers only the conversation history from before the current assistant’s session. This excludes the current assistant’s own messages, tool calls, and tool results entirely, forwarding only the context that existed when the current assistant first received the call.
This mode is particularly useful when the current assistant handles sensitive data (such as payment card numbers in a PCI-compliant flow). By excluding the current assistant’s session from the forwarded context, you prevent sensitive tool call results from reaching the next assistant.
Use previousAssistantMessages when handing off from a sensitive assistant (e.g., one collecting payment data) to a non-sensitive assistant. It preserves useful conversation context from earlier in the call while ensuring the sensitive assistant’s tool call data is not forwarded. See the PCI Compliance - Handoff Context Configuration guide for a complete walkthrough.
Starts the next assistant with a blank conversation:
Extract and pass structured data during handoff. Variables extracted by the handoff tool are available to all subsequent assistants in the conversation chain. When a handoff extracts a variable with the same name as an existing one, the new value replaces the previous value.
variableExtractionPlan in destinationsThis extraction method makes an OpenAI structured output request to extract variables. Use this when you have multiple destinations, each with different variables that need to be extracted.
Once extracted, variables are accessible using Liquid template syntax ({{variableName}}). The access pattern depends on the schema structure:
Top-level object properties are extracted as direct global variables. For example, a schema with properties name and age produces {{name}} and {{age}} — not {{root.name}}.
Use aliases to create additional variables derived from extracted values. Aliases support Liquid template syntax for transformations and compositions.
Each alias creates a new variable accessible as {{key}} during the call and stored in call.artifact.variableValues after the call. Alias keys must start with a letter and contain only letters, numbers, or underscores (max 40 characters).
tool.functionYou can also extract variables through the LLM tool call parameters (in addition to sending these parameters to your server in a handoff-destination-request for dynamic handoffs). Include the destination parameter with the assistant names or IDs in enum — Vapi uses this to determine where to hand off the call. The destination parameter itself is not extracted as a variable. Add destination and all other required variables to the schema’s required array.
Configure what the assistant says during each phase of the handoff. Add a messages array to the handoff tool to control the spoken responses.
request-start
false) — When true, the tool call waits until the message finishes speaking before executing.content when provided.request-complete
"assistant" | "system", default: "assistant") — When "assistant", the content is spoken aloud. When "system", the content is passed as a system message hint to the model.false) — When true, the call ends after this message is spoken.request-failed
false) — When true, the call ends after this message.request-response-delayed
For the full schema, see the API reference.
Use rejectionPlan to prevent a handoff from executing based on conversation state. When all conditions in the plan match, the tool call is rejected and the rejection message is added to the conversation.
Match against message content using regular expressions:
This rejects the handoff if the user’s most recent message contains “cancel”, “stop”, or “nevermind” (case-insensitive).
Use Liquid templates for more complex logic. The template must return exactly "true" or "false":
This rejects the handoff if fewer than 3 user messages exist in the conversation. Available Liquid variables include messages (array of recent messages), now (current timestamp), and any assistant variable values.
Combine multiple conditions with AND or OR logic:
By default, all top-level conditions in the conditions array use AND logic — all must match for the rejection to trigger. Use a group condition with operator: "OR" to reject when any single condition matches.
For the full schema, see the API reference.
Override the default function definition for more control. You can overwrite the function name for each tool to reference in the system prompt, or pass custom parameters in a dynamic handoff request.
tool.function.name to customize the name of the function to reference in your prompt.lastNMessages or userAndAssistantMessages to limit context size for performance.request-start messages to set caller expectations during transfers.artifactPlan.fullMessageHistoryEnabled to capture the complete message history across all handoffs in your artifacts. See squad artifact behavior for details.\\ in JSON)