Handoff tool
The handoff tool enables seamless call transfers between assistants in a multi-agent system. This guide covers all configuration patterns, destination types, context management, and advanced features.
Table of contents
- Overview
- System prompt best practices
- Basic configuration
- Multiple destinations
- Dynamic handoffs
- Squad destinations
- Context engineering
- Variable extraction
- Tool messages
- Rejection plan
- Custom function definitions
Overview
The handoff tool transfers calls between assistants during a conversation. You can:
- Transfer to a specific assistant by ID or by name (within a squad)
- Transfer to an entire squad with a designated entry assistant
- Support multiple destination options for the AI to choose from
- Determine the destination dynamically at runtime via a webhook
- Control what conversation history the next assistant receives
- Extract structured variables from the conversation for downstream use
- Configure spoken messages for each phase of the handoff
- Reject handoff attempts based on conversation state
System prompt best practices
When using the handoff tool, add this to your system prompt for optimal agent coordination (adapted from the OpenAI Agents Handoff Prompt):
Basic configuration
Single destination handoff
Using assistant ID
Using assistant name (for squad members)
Each assistant destination also supports assistantOverrides to override settings on the destination assistant, and an inline assistant property to create a transient assistant without saving it first. See the API reference for all available properties.
Multiple destinations
Multiple tools pattern (OpenAI recommended)
Best for OpenAI models — creates separate tool definitions for each destination:
Single tool pattern (Anthropic recommended)
Best for Anthropic models — single tool with multiple destination options:
Dynamic handoffs
Basic dynamic handoff
The destination is determined at runtime via the handoff-destination-request webhook:
Your server must respond with a single destination. You can return an assistantId, assistantName (if using squads), or a transient assistant. For example:
If the handoff should not execute, either respond with an empty destination, or provide a custom error. The custom error is added to the message history.
Dynamic handoff with custom parameters
Pass additional context to your webhook for intelligent routing:
Squad destinations
In addition to assistant and dynamic destinations, you can hand off a call to an entire squad. This transfers the caller into a new multi-agent system where the squad’s own routing logic takes over.
Using squad ID
Reference a saved squad by its ID:
Using a transient squad
Define the squad inline without saving it first:
Squad destination properties
For the full schema, see the API reference.
Context engineering
Control what conversation history transfers to the next assistant or squad. Set contextEngineeringPlan on any destination.
All messages (default)
Transfers the entire conversation history:
Last N messages
Transfers only the most recent N messages. Use this to limit context size for performance:
User and assistant messages only
Transfers only user and assistant messages, filtering out system messages, tool calls, and tool results. This gives the next assistant a clean view of the conversation without internal implementation details:
Use userAndAssistantMessages when the destination assistant does not need to see tool call history or system prompts from the previous assistant. This produces a cleaner context and reduces token usage.
No context
Starts the next assistant with a blank conversation:
Variable extraction
Extract and pass structured data during handoff. Variables extracted by the handoff tool are available to all subsequent assistants in the conversation chain. When a handoff extracts a variable with the same name as an existing one, the new value replaces the previous value.
Extraction via variableExtractionPlan in destinations
This extraction method makes an OpenAI structured output request to extract variables. Use this when you have multiple destinations, each with different variables that need to be extracted.
Variable access patterns
Once extracted, variables are accessible using Liquid template syntax ({{variableName}}). The access pattern depends on the schema structure:
Top-level object properties are extracted as direct global variables. For example, a schema with properties name and age produces {{name}} and {{age}} — not {{root.name}}.
Variable aliases
Use aliases to create additional variables derived from extracted values. Aliases support Liquid template syntax for transformations and compositions.
Each alias creates a new variable accessible as {{key}} during the call and stored in call.artifact.variableValues after the call. Alias keys must start with a letter and contain only letters, numbers, or underscores (max 40 characters).
Extraction via tool.function
You can also extract variables through the LLM tool call parameters (in addition to sending these parameters to your server in a handoff-destination-request for dynamic handoffs). Include the destination parameter with the assistant names or IDs in enum — Vapi uses this to determine where to hand off the call. The destination parameter itself is not extracted as a variable. Add destination and all other required variables to the schema’s required array.
Tool messages
Configure what the assistant says during each phase of the handoff. Add a messages array to the handoff tool to control the spoken responses.
Message types
Example configuration
Message properties
request-start
- content (string) — The text the assistant speaks when the handoff begins.
- blocking (boolean, default:
false) — Whentrue, the tool call waits until the message finishes speaking before executing. - conditions (array) — Optional conditions that must match for this message to trigger.
- contents (array) — Multilingual variants of the content. Overrides
contentwhen provided.
request-complete
- content (string) — The text the assistant speaks when the handoff completes.
- role (
"assistant"|"system", default:"assistant") — When"assistant", the content is spoken aloud. When"system", the content is passed as a system message hint to the model. - endCallAfterSpokenEnabled (boolean, default:
false) — Whentrue, the call ends after this message is spoken. - conditions (array) — Optional conditions for triggering this message.
- contents (array) — Multilingual variants.
request-failed
- content (string) — The text the assistant speaks when the handoff fails.
- endCallAfterSpokenEnabled (boolean, default:
false) — Whentrue, the call ends after this message. - conditions (array) — Optional conditions for triggering.
- contents (array) — Multilingual variants.
request-response-delayed
- content (string) — The text the assistant speaks when the handoff is taking longer than expected.
- timingMilliseconds (number, 100-120000) — Milliseconds to wait before triggering this message.
- conditions (array) — Optional conditions for triggering.
- contents (array) — Multilingual variants.
For the full schema, see the API reference.
Rejection plan
Use rejectionPlan to prevent a handoff from executing based on conversation state. When all conditions in the plan match, the tool call is rejected and the rejection message is added to the conversation.
Regex condition
Match against message content using regular expressions:
This rejects the handoff if the user’s most recent message contains “cancel”, “stop”, or “nevermind” (case-insensitive).
Liquid condition
Use Liquid templates for more complex logic. The template must return exactly "true" or "false":
This rejects the handoff if fewer than 3 user messages exist in the conversation. Available Liquid variables include messages (array of recent messages), now (current timestamp), and any assistant variable values.
Group condition
Combine multiple conditions with AND or OR logic:
By default, all top-level conditions in the conditions array use AND logic — all must match for the rejection to trigger. Use a group condition with operator: "OR" to reject when any single condition matches.
For the full schema, see the API reference.
Custom function definitions
Override the default function definition for more control. You can overwrite the function name for each tool to reference in the system prompt, or pass custom parameters in a dynamic handoff request.
Best practices
- Clear descriptions: Write specific, actionable descriptions for each destination in your system prompt. Use
tool.function.nameto customize the name of the function to reference in your prompt. - Context management: Use
lastNMessagesoruserAndAssistantMessagesto limit context size for performance. - Model optimization: Use multiple tools for OpenAI, single tool for Anthropic.
- Variable extraction: Extract key data before handoff to maintain context across assistants.
- Tool messages: Add custom
request-startmessages to set caller expectations during transfers. - Testing: Test handoff scenarios thoroughly, including edge cases and rejection conditions.
- Monitoring and analysis: Enable
artifactPlan.fullMessageHistoryEnabledto capture the complete message history across all handoffs in your artifacts. See squad artifact behavior for details.
Troubleshooting
- Ensure assistant IDs are valid and accessible
- Verify webhook server URLs are reachable and return the proper format
- Check that required parameters in custom functions match destinations
- Monitor context size to avoid token limits
- Test variable extraction schemas with sample data
- Validate that assistant names exist in the same squad
- Verify rejection plan conditions use correct regex syntax (remember to double-escape
\\in JSON)