Create call
Retrieve your API Key from Dashboard.
This is the name of the call. This is just for your own reference.
This is the assistant that will be used for the call. To use a transient assistant, use assistant
instead.
This is the assistant that will be used for the call. To use an existing assistant, use assistantId
instead.
These are the options for the assistant's transcriber.
This is the transcription provider that will be used.
This is the Deepgram model that will be used. A list of models can be found here: https://developers.deepgram.com/docs/models-languages-overview
This is the language that will be set for the transcription. The list of languages Deepgram supports can be found here: https://developers.deepgram.com/docs/models-languages-overview
This will be use smart format option provided by Deepgram. It's default disabled because it can sometimes format numbers as times but it's getting better.
These keywords are passed to the transcription model to help it pick up use-case specific words. Anything that may not be a common word, like your company name, should be added here.
This is the timeout after which Deepgram will send transcription on user silence. You can read in-depth documentation here: https://developers.deepgram.com/docs/endpointing.
Here are the most important bits:
- Defaults to 10. This is recommended for most use cases to optimize for latency.
- 10 can cause some missing transcriptions since because of the shorter context. This mostly happens for one-word utterances. For those uses cases, it's recommended to try 300. It will add a bit of latency but the quality and reliability of the experience will be better.
- If neither 10 nor 300 work, contact [email protected] and we'll find another solution.
@default 10
These are the options for the assistant's LLM.
This is the starting state for the conversation.
These are the tools that the assistant can use during the call. To use existing tools, use toolIds
.
Both tools
and toolIds
can be used together.
These are the tools that the assistant can use during the call. To use transient tools, use tools
.
Both tools
and toolIds
can be used together.
This is the name of the model. Ex. cognitivecomputations/dolphin-mixtral-8x7b
This is the temperature that will be used for calls. Default is 0 to leverage caching for lower latency.
These are the options for the knowledge base.
This is the max number of tokens that the assistant will be allowed to generate in each turn of the conversation. Default is 250.
This determines whether we detect user's emotion while they speak and send it as an additional info to model.
Default false
because the model is usually are good at understanding the user's emotion from text.
@default false
This sets how many turns at the start of the conversation to use a smaller, faster model from the same provider before switching to the primary model. Example, gpt-3.5-turbo if provider is openai.
Default is 0.
@default 0
These are the options for the assistant's voice.
This determines whether fillers are injected into the model output before inputting it into the voice provider.
Default false
because you can achieve better results with prompting the model.
This is the voice provider that will be used.
This is the provider-specific ID that will be used.
This is the speed multiplier that will be used.
This is the plan for chunking the model output before it is sent to the voice provider.
This determines whether the model output is chunked before being sent to the voice provider. Default true
.
Usage:
- To rely on the voice provider's audio generation logic, set this to
false
. - If seeing issues with quality, set this to
true
.
If disabled, Vapi-provided audio control tokens like <flush /> will not work.
@default true
This is the minimum number of characters in a chunk.
Usage:
- To increase quality, set this to a higher value.
- To decrease latency, set this to a lower value.
@default 30
These are the punctuations that are considered valid boundaries for a chunk to be created.
Usage:
- To increase quality, constrain to fewer boundaries.
- To decrease latency, enable all.
Default is automatically set to balance the trade-off between quality and latency based on the provider.
This is the plan for formatting the chunk before it is sent to the voice provider.
This is the mode for the first message. Default is 'assistant-speaks-first'.
Use:
- 'assistant-speaks-first' to have the assistant speak first.
- 'assistant-waits-for-user' to have the assistant wait for the user to speak first.
- 'assistant-speaks-first-with-model-generated-message' to have the assistant speak first with a message generated by the model based on the conversation state. (
assistant.model.messages
at call start,call.messages
at squad transfer points).
@default 'assistant-speaks-first'
This sets whether the assistant's calls are recorded. Defaults to true.
When this is enabled, no logs, recordings, or transcriptions will be stored. At the end of the call, you will still receive an end-of-call-report message to store on your server. Defaults to false.
These are the messages that will be sent to your Client SDKs. Default is conversation-update,function-call,hang,model-output,speech-update,status-update,transcript,tool-calls,user-interrupted,voice-input. You can check the shape of the messages in ClientMessage schema.
These are the messages that will be sent to your Server URL. Default is conversation-update,end-of-call-report,function-call,hang,speech-update,status-update,tool-calls,transfer-destination-request,user-interrupted. You can check the shape of the messages in ServerMessage schema.
How many seconds of silence to wait before ending the call. Defaults to 30.
@default 30
This is the maximum number of seconds that the call will last. When the call reaches this duration, it will be ended.
@default 600 (10 minutes)
This is the background sound in the call. Default for phone calls is 'office' and default for web calls is 'off'.
This determines whether the model says 'mhmm', 'ahem' etc. while user is speaking.
Default false
while in beta.
@default false
This enables filtering of noise and background speech while the user is talking.
Default false
while in beta.
@default false
This determines whether the model's output is used in conversation history rather than the transcription of assistant's speech.
Default false
while in beta.
@default false
These are the configurations to be passed to the transport providers of assistant's calls, like Twilio. You can store multiple configurations for different transport providers. For a call, only the configuration matching the call transport provider is used.
This is the name of the assistant.
This is required when you want to transfer between assistants in a call.
This is the first message that the assistant will say. This can also be a URL to a containerized audio file (mp3, wav, etc.).
If unspecified, assistant will wait for user to speak and use the model to respond once they speak.
These are the settings to configure or disable voicemail detection. Alternatively, voicemail detection can be configured using the model.tools=[VoicemailTool]. This uses Twilio's built-in detection while the VoicemailTool relies on the model to detect if a voicemail was reached. You can use neither of them, one of them, or both of them. By default, Twilio built-in detection is enabled while VoicemailTool is not.
This is the provider to use for voicemail detection.
These are the AMD messages from Twilio that are considered as voicemail. Default is ['machine_end_beep', 'machine_end_silence'].
@default {Array} ['machine_end_beep', 'machine_end_silence']
This sets whether the assistant should detect voicemail. Defaults to true.
@default true
The number of seconds that Twilio should attempt to perform answering machine detection before timing out and returning AnsweredBy as unknown. Default is 30 seconds.
Increasing this value will provide the engine more time to make a determination. This can be useful when DetectMessageEnd is provided in the MachineDetection parameter and there is an expectation of long answering machine greetings that can exceed 30 seconds.
Decreasing this value will reduce the amount of time the engine has to make a determination. This can be particularly useful when the Enable option is provided in the MachineDetection parameter and you want to limit the time for initial detection.
Check the Twilio docs for more info.
@default 30
The number of milliseconds that is used as the measuring stick for the length of the speech activity. Durations lower than this value will be interpreted as a human, longer as a machine. Default is 2400 milliseconds.
Increasing this value will reduce the chance of a False Machine (detected machine, actually human) for a long human greeting (e.g., a business greeting) but increase the time it takes to detect a machine.
Decreasing this value will reduce the chances of a False Human (detected human, actually machine) for short voicemail greetings. The value of this parameter may need to be reduced by more than 1000ms to detect very short voicemail greetings. A reduction of that significance can result in increased False Machine detections. Adjusting the MachineDetectionSpeechEndThreshold is likely the better approach for short voicemails. Decreasing MachineDetectionSpeechThreshold will also reduce the time it takes to detect a machine.
Check the Twilio docs for more info.
@default 2400
The number of milliseconds of silence after speech activity at which point the speech activity is considered complete. Default is 1200 milliseconds.
Increasing this value will typically be used to better address the short voicemail greeting scenarios. For short voicemails, there is typically 1000-2000ms of audio followed by 1200-2400ms of silence and then additional audio before the beep. Increasing the MachineDetectionSpeechEndThreshold to ~2500ms will treat the 1200-2400ms of silence as a gap in the greeting but not the end of the greeting and will result in a machine detection. The downsides of such a change include:
- Increasing the delay for human detection by the amount you increase this parameter, e.g., a change of 1200ms to 2500ms increases human detection delay by 1300ms.
- Cases where a human has two utterances separated by a period of silence (e.g. a "Hello", then 2000ms of silence, and another "Hello") may be interpreted as a machine.
Decreasing this value will result in faster human detection. The consequence is that it can lead to increased False Human (detected human, actually machine) detections because a silence gap in a voicemail greeting (not necessarily just in short voicemail scenarios) can be incorrectly interpreted as the end of speech.
Check the Twilio docs for more info.
@default 1200
The number of milliseconds of initial silence after which an unknown AnsweredBy result will be returned. Default is 5000 milliseconds.
Increasing this value will result in waiting for a longer period of initial silence before returning an 'unknown' AMD result.
Decreasing this value will result in waiting for a shorter period of initial silence before returning an 'unknown' AMD result.
Check the Twilio docs for more info.
@default 5000
This is the message that the assistant will say if the call is forwarded to voicemail.
If unspecified, it will hang up.
This is the message that the assistant will say if it ends the call.
If unspecified, it will hang up without saying anything.
This list contains phrases that, if spoken by the assistant, will trigger the call to be hung up. Case insensitive.
This is for metadata you want to store on the assistant.
This is the URL Vapi will communicate with via HTTP GET and POST Requests. This is used for retrieving context, function calling, and end-of-call reports.
All requests will be sent with the call object among other things relevant to that message. You can find more details in the Server URL documentation.
This overrides the serverUrl set on the org and the phoneNumber. Order of precedence: tool.server.url > assistant.serverUrl > phoneNumber.serverUrl > org.serverUrl
This is the secret you can set that Vapi will send with every request to your server. Will be sent as a header called x-vapi-secret.
Same precedence logic as serverUrl.
This is the plan for analysis of assistant's calls. Stored in call.analysis
.
This is the prompt that's used to summarize the call. The output is stored in call.analysis.summary
.
Default is "You are an expert note-taker. You will be given a transcript of a call. Summarize the call in 2-3 sentences. DO NOT return anything except the summary.".
Set to '' or 'off' to disable.
This is how long the request is tried before giving up. When request times out, call.analysis.summary
will be empty. Increasing this timeout will delay the end of call report.
Default is 5 seconds.
This is how long the request is tried before giving up. When request times out, call.analysis.structuredData
will be empty. Increasing this timeout will delay the end of call report.
Default is 5 seconds.
This is the prompt that's used to evaluate if the call was successful. The output is stored in call.analysis.successEvaluation
.
Default is "You are an expert call evaluator. You will be given a transcript of a call and the system prompt of the AI participant. Determine if the call was successful based on the objectives inferred from the system prompt. DO NOT return anything except the result.".
Set to '' or 'off' to disable.
You can use this standalone or in combination with successEvaluationRubric
. If both are provided, they are concatenated into appropriate instructions.
This enforces the rubric of the evaluation. The output is stored in call.analysis.successEvaluation
.
Options include:
- 'NumericScale': A scale of 1 to 10.
- 'DescriptiveScale': A scale of Excellent, Good, Fair, Poor.
- 'Checklist': A checklist of criteria and their status.
- 'Matrix': A grid that evaluates multiple criteria across different performance levels.
- 'PercentageScale': A scale of 0% to 100%.
- 'LikertScale': A scale of Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree.
- 'AutomaticRubric': Automatically break down evaluation into several criteria, each with its own score.
- 'PassFail': A simple 'true' if call passed, 'false' if not.
For 'Checklist' and 'Matrix', provide the criteria in successEvaluationPrompt
.
Default is 'PassFail' if successEvaluationPrompt
is not provided, and null if successEvaluationPrompt
is provided.
You can use this standalone or in combination with successEvaluationPrompt
. If both are provided, they are concatenated into appropriate instructions.
This is how long the request is tried before giving up. When request times out, call.analysis.successEvaluation
will be empty. Increasing this timeout will delay the end of call report.
Default is 5 seconds.
This is the prompt that's used to extract structured data from the call. The output is stored in call.analysis.structuredData
.
Disabled by default.
You can use this standalone or in combination with structuredDataSchema
. If both are provided, they are concatenated into appropriate instructions.
This enforces the schema of the structured data. This output is stored in call.analysis.structuredData
.
Complete guide on JSON Schema can be found here.
Disabled by default.
You can use this standalone or in combination with structuredDataPrompt
. If both are provided, they are concatenated into appropriate instructions.
This is the type of output you'd like.
string
, number
, integer
, boolean
are the primitive types and should be obvious.
array
and object
are more interesting and quite powerful. They allow you to define nested structures.
For array
, you can define the schema of the items in the array using the items
property.
For object
, you can define the properties of the object using the properties
property.
This is required if the type is "array". This is the schema of the items in the array.
This is of type JsonSchema. However, Swagger doesn't support circular references.
This is required if the type is "object". This specifies the properties of the object.
This is a map of string to JsonSchema. However, Swagger doesn't support circular references.
This is the description to help the model understand what it needs to output.
This is a list of properties that are required.
This only makes sense if the type is "object".
This is the plan for artifacts generated during assistant's calls. Stored in call.artifact
.
Note: recordingEnabled
is currently at the root level. It will be moved to artifactPlan
in the future, but will remain backwards compatible.
This determines whether the video is recorded during the call. Default is false. Only relevant for webCall
type.
This is the S3 path prefix for the audio recording. This is only used if you have provided S3 credentials. Check the Providers page in the Dashboard.
If credential.s3PathPrefix is set, this will append to it.
Example: /my-prefix
. Default is /
.
This is the plan for static messages that can be spoken by the assistant during the call, like idleMessages
.
Note: firstMessage
, voicemailMessage
, and endCallMessage
are currently at the root level. They will be moved to messagePlan
in the future, but will remain backwards compatible.
This are the messages that the assistant will speak when the user hasn't responded for idleTimeoutSeconds
. Each time the timeout is triggered, a random message will be chosen from this array.
@default null (no idle message is spoken)
This determines the maximum number of times idleMessages
can be spoken during the call.
@default 3
This is the timeout in seconds before a message from idleMessages
is spoken. The clock starts when the assistant finishes speaking and remains active until the user speaks.
@default 10
This is the plan for when the assistant should start talking.
You should configure this if you're running into these issues:
- The assistant is too slow to start talking after the customer is done speaking.
- The assistant is too fast to start talking after the customer is done speaking.
- The assistant is so fast that it's actually interrupting the customer.
This is how long assistant waits before speaking. Defaults to 0.4.
This is the minimum it will wait but if there is latency is the pipeline, this minimum will be exceeded. This is really a stopgap in case the pipeline is moving too fast.
Example:
- If model generates tokens and voice generates bytes within 100ms, the pipeline still waits 300ms before outputting speech.
Usage:
- If the customer is taking long pauses, set this to a higher value.
- If the assistant is accidentally jumping in too much, set this to a higher value.
@default 0.4
This determines if a customer speech is considered done (endpointing) using the VAP model on customer's speech. This is good for middle-of-thought detection.
Once an endpoint is triggered, the request is sent to assistant.model
.
Default false
since experimental.
@default false
This determines how a customer speech is considered done (endpointing) using the transcription of customer's speech.
Once an endpoint is triggered, the request is sent to assistant.model
.
The minimum number of seconds to wait after transcription ending with punctuation before sending a request to the model. Defaults to 0.1.
This setting exists because the transcriber punctuates the transcription when it's more confident that customer has completed a thought.
@default 0.1
The minimum number of seconds to wait after transcription ending without punctuation before sending a request to the model. Defaults to 1.5.
This setting exists to catch the cases where the transcriber was not confident enough to punctuate the transcription, but the customer is done and has been silent for a long time.
@default 1.5
The minimum number of seconds to wait after transcription ending with a number before sending a request to the model. Defaults to 0.4.
This setting exists because the transcriber will sometimes punctuate the transcription ending with a number, even though the customer hasn't uttered the full number. This happens commonly for long numbers when the customer reads the number in chunks.
@default 0.5
This is the plan for when assistant should stop talking on customer interruption.
You should configure this if you're running into these issues:
- The assistant is too slow to recognize customer's interruption.
- The assistant is too fast to recognize customer's interruption.
- The assistant is getting interrupted by phrases that are just acknowledgments.
- The assistant is getting interrupted by background noises.
- The assistant is not properly stopping -- it starts talking right after getting interrupted.
This is the number of words that the customer has to say before the assistant will stop talking.
Words like "stop", "actually", "no", etc. will always interrupt immediately regardless of this value.
Words like "okay", "yeah", "right" will never interrupt.
When set to 0, voiceSeconds
is used in addition to the transcriptions to determine the customer has started speaking.
Defaults to 0.
@default 0
This is the seconds customer has to speak before the assistant stops talking. This uses the VAD (Voice Activity Detection) spike to determine if the customer has started speaking.
Considerations:
- A lower value might be more responsive but could potentially pick up non-speech sounds.
- A higher value reduces false positives but might slightly delay the detection of speech onset.
This is only used if numWords
is set to 0.
Defaults to 0.2
@default 0.2
This is the seconds to wait before the assistant will start talking again after being interrupted.
Defaults to 1.
@default 1
These are the credentials that will be used for the assistant calls. By default, all the credentials are available for use in the call but you can provide a subset using this.
These are the overrides for the assistant
or assistantId
's settings and template variables.
These are the options for the assistant's transcriber.
This is the transcription provider that will be used.
This is the Deepgram model that will be used. A list of models can be found here: https://developers.deepgram.com/docs/models-languages-overview
This is the language that will be set for the transcription. The list of languages Deepgram supports can be found here: https://developers.deepgram.com/docs/models-languages-overview
This will be use smart format option provided by Deepgram. It's default disabled because it can sometimes format numbers as times but it's getting better.
These keywords are passed to the transcription model to help it pick up use-case specific words. Anything that may not be a common word, like your company name, should be added here.
This is the timeout after which Deepgram will send transcription on user silence. You can read in-depth documentation here: https://developers.deepgram.com/docs/endpointing.
Here are the most important bits:
- Defaults to 10. This is recommended for most use cases to optimize for latency.
- 10 can cause some missing transcriptions since because of the shorter context. This mostly happens for one-word utterances. For those uses cases, it's recommended to try 300. It will add a bit of latency but the quality and reliability of the experience will be better.
- If neither 10 nor 300 work, contact [email protected] and we'll find another solution.
@default 10
These are the options for the assistant's LLM.
This is the starting state for the conversation.
These are the tools that the assistant can use during the call. To use existing tools, use toolIds
.
Both tools
and toolIds
can be used together.
These are the tools that the assistant can use during the call. To use transient tools, use tools
.
Both tools
and toolIds
can be used together.
This is the name of the model. Ex. cognitivecomputations/dolphin-mixtral-8x7b
This is the temperature that will be used for calls. Default is 0 to leverage caching for lower latency.
These are the options for the knowledge base.
This is the max number of tokens that the assistant will be allowed to generate in each turn of the conversation. Default is 250.
This determines whether we detect user's emotion while they speak and send it as an additional info to model.
Default false
because the model is usually are good at understanding the user's emotion from text.
@default false
This sets how many turns at the start of the conversation to use a smaller, faster model from the same provider before switching to the primary model. Example, gpt-3.5-turbo if provider is openai.
Default is 0.
@default 0
These are the options for the assistant's voice.
This determines whether fillers are injected into the model output before inputting it into the voice provider.
Default false
because you can achieve better results with prompting the model.
This is the voice provider that will be used.
This is the provider-specific ID that will be used.
This is the speed multiplier that will be used.
This is the plan for chunking the model output before it is sent to the voice provider.
This determines whether the model output is chunked before being sent to the voice provider. Default true
.
Usage:
- To rely on the voice provider's audio generation logic, set this to
false
. - If seeing issues with quality, set this to
true
.
If disabled, Vapi-provided audio control tokens like <flush /> will not work.
@default true
This is the minimum number of characters in a chunk.
Usage:
- To increase quality, set this to a higher value.
- To decrease latency, set this to a lower value.
@default 30
These are the punctuations that are considered valid boundaries for a chunk to be created.
Usage:
- To increase quality, constrain to fewer boundaries.
- To decrease latency, enable all.
Default is automatically set to balance the trade-off between quality and latency based on the provider.
This is the plan for formatting the chunk before it is sent to the voice provider.
This is the mode for the first message. Default is 'assistant-speaks-first'.
Use:
- 'assistant-speaks-first' to have the assistant speak first.
- 'assistant-waits-for-user' to have the assistant wait for the user to speak first.
- 'assistant-speaks-first-with-model-generated-message' to have the assistant speak first with a message generated by the model based on the conversation state. (
assistant.model.messages
at call start,call.messages
at squad transfer points).
@default 'assistant-speaks-first'
This sets whether the assistant's calls are recorded. Defaults to true.
When this is enabled, no logs, recordings, or transcriptions will be stored. At the end of the call, you will still receive an end-of-call-report message to store on your server. Defaults to false.
These are the messages that will be sent to your Client SDKs. Default is conversation-update,function-call,hang,model-output,speech-update,status-update,transcript,tool-calls,user-interrupted,voice-input. You can check the shape of the messages in ClientMessage schema.
These are the messages that will be sent to your Server URL. Default is conversation-update,end-of-call-report,function-call,hang,speech-update,status-update,tool-calls,transfer-destination-request,user-interrupted. You can check the shape of the messages in ServerMessage schema.
How many seconds of silence to wait before ending the call. Defaults to 30.
@default 30
This is the maximum number of seconds that the call will last. When the call reaches this duration, it will be ended.
@default 600 (10 minutes)
This is the background sound in the call. Default for phone calls is 'office' and default for web calls is 'off'.
This determines whether the model says 'mhmm', 'ahem' etc. while user is speaking.
Default false
while in beta.
@default false
This enables filtering of noise and background speech while the user is talking.
Default false
while in beta.
@default false
This determines whether the model's output is used in conversation history rather than the transcription of assistant's speech.
Default false
while in beta.
@default false
These are the configurations to be passed to the transport providers of assistant's calls, like Twilio. You can store multiple configurations for different transport providers. For a call, only the configuration matching the call transport provider is used.
These are values that will be used to replace the template variables in the assistant messages and other text-based fields.
This is the name of the assistant.
This is required when you want to transfer between assistants in a call.
This is the first message that the assistant will say. This can also be a URL to a containerized audio file (mp3, wav, etc.).
If unspecified, assistant will wait for user to speak and use the model to respond once they speak.
These are the settings to configure or disable voicemail detection. Alternatively, voicemail detection can be configured using the model.tools=[VoicemailTool]. This uses Twilio's built-in detection while the VoicemailTool relies on the model to detect if a voicemail was reached. You can use neither of them, one of them, or both of them. By default, Twilio built-in detection is enabled while VoicemailTool is not.
This is the provider to use for voicemail detection.
These are the AMD messages from Twilio that are considered as voicemail. Default is ['machine_end_beep', 'machine_end_silence'].
@default {Array} ['machine_end_beep', 'machine_end_silence']
This sets whether the assistant should detect voicemail. Defaults to true.
@default true
The number of seconds that Twilio should attempt to perform answering machine detection before timing out and returning AnsweredBy as unknown. Default is 30 seconds.
Increasing this value will provide the engine more time to make a determination. This can be useful when DetectMessageEnd is provided in the MachineDetection parameter and there is an expectation of long answering machine greetings that can exceed 30 seconds.
Decreasing this value will reduce the amount of time the engine has to make a determination. This can be particularly useful when the Enable option is provided in the MachineDetection parameter and you want to limit the time for initial detection.
Check the Twilio docs for more info.
@default 30
The number of milliseconds that is used as the measuring stick for the length of the speech activity. Durations lower than this value will be interpreted as a human, longer as a machine. Default is 2400 milliseconds.
Increasing this value will reduce the chance of a False Machine (detected machine, actually human) for a long human greeting (e.g., a business greeting) but increase the time it takes to detect a machine.
Decreasing this value will reduce the chances of a False Human (detected human, actually machine) for short voicemail greetings. The value of this parameter may need to be reduced by more than 1000ms to detect very short voicemail greetings. A reduction of that significance can result in increased False Machine detections. Adjusting the MachineDetectionSpeechEndThreshold is likely the better approach for short voicemails. Decreasing MachineDetectionSpeechThreshold will also reduce the time it takes to detect a machine.
Check the Twilio docs for more info.
@default 2400
The number of milliseconds of silence after speech activity at which point the speech activity is considered complete. Default is 1200 milliseconds.
Increasing this value will typically be used to better address the short voicemail greeting scenarios. For short voicemails, there is typically 1000-2000ms of audio followed by 1200-2400ms of silence and then additional audio before the beep. Increasing the MachineDetectionSpeechEndThreshold to ~2500ms will treat the 1200-2400ms of silence as a gap in the greeting but not the end of the greeting and will result in a machine detection. The downsides of such a change include:
- Increasing the delay for human detection by the amount you increase this parameter, e.g., a change of 1200ms to 2500ms increases human detection delay by 1300ms.
- Cases where a human has two utterances separated by a period of silence (e.g. a "Hello", then 2000ms of silence, and another "Hello") may be interpreted as a machine.
Decreasing this value will result in faster human detection. The consequence is that it can lead to increased False Human (detected human, actually machine) detections because a silence gap in a voicemail greeting (not necessarily just in short voicemail scenarios) can be incorrectly interpreted as the end of speech.
Check the Twilio docs for more info.
@default 1200
The number of milliseconds of initial silence after which an unknown AnsweredBy result will be returned. Default is 5000 milliseconds.
Increasing this value will result in waiting for a longer period of initial silence before returning an 'unknown' AMD result.
Decreasing this value will result in waiting for a shorter period of initial silence before returning an 'unknown' AMD result.
Check the Twilio docs for more info.
@default 5000
This is the message that the assistant will say if the call is forwarded to voicemail.
If unspecified, it will hang up.
This is the message that the assistant will say if it ends the call.
If unspecified, it will hang up without saying anything.
This list contains phrases that, if spoken by the assistant, will trigger the call to be hung up. Case insensitive.
This is for metadata you want to store on the assistant.
This is the URL Vapi will communicate with via HTTP GET and POST Requests. This is used for retrieving context, function calling, and end-of-call reports.
All requests will be sent with the call object among other things relevant to that message. You can find more details in the Server URL documentation.
This overrides the serverUrl set on the org and the phoneNumber. Order of precedence: tool.server.url > assistant.serverUrl > phoneNumber.serverUrl > org.serverUrl
This is the secret you can set that Vapi will send with every request to your server. Will be sent as a header called x-vapi-secret.
Same precedence logic as serverUrl.
This is the plan for analysis of assistant's calls. Stored in call.analysis
.
This is the prompt that's used to summarize the call. The output is stored in call.analysis.summary
.
Default is "You are an expert note-taker. You will be given a transcript of a call. Summarize the call in 2-3 sentences. DO NOT return anything except the summary.".
Set to '' or 'off' to disable.
This is how long the request is tried before giving up. When request times out, call.analysis.summary
will be empty. Increasing this timeout will delay the end of call report.
Default is 5 seconds.
This is how long the request is tried before giving up. When request times out, call.analysis.structuredData
will be empty. Increasing this timeout will delay the end of call report.
Default is 5 seconds.
This is the prompt that's used to evaluate if the call was successful. The output is stored in call.analysis.successEvaluation
.
Default is "You are an expert call evaluator. You will be given a transcript of a call and the system prompt of the AI participant. Determine if the call was successful based on the objectives inferred from the system prompt. DO NOT return anything except the result.".
Set to '' or 'off' to disable.
You can use this standalone or in combination with successEvaluationRubric
. If both are provided, they are concatenated into appropriate instructions.
This enforces the rubric of the evaluation. The output is stored in call.analysis.successEvaluation
.
Options include:
- 'NumericScale': A scale of 1 to 10.
- 'DescriptiveScale': A scale of Excellent, Good, Fair, Poor.
- 'Checklist': A checklist of criteria and their status.
- 'Matrix': A grid that evaluates multiple criteria across different performance levels.
- 'PercentageScale': A scale of 0% to 100%.
- 'LikertScale': A scale of Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree.
- 'AutomaticRubric': Automatically break down evaluation into several criteria, each with its own score.
- 'PassFail': A simple 'true' if call passed, 'false' if not.
For 'Checklist' and 'Matrix', provide the criteria in successEvaluationPrompt
.
Default is 'PassFail' if successEvaluationPrompt
is not provided, and null if successEvaluationPrompt
is provided.
You can use this standalone or in combination with successEvaluationPrompt
. If both are provided, they are concatenated into appropriate instructions.
This is how long the request is tried before giving up. When request times out, call.analysis.successEvaluation
will be empty. Increasing this timeout will delay the end of call report.
Default is 5 seconds.
This is the prompt that's used to extract structured data from the call. The output is stored in call.analysis.structuredData
.
Disabled by default.
You can use this standalone or in combination with structuredDataSchema
. If both are provided, they are concatenated into appropriate instructions.
This enforces the schema of the structured data. This output is stored in call.analysis.structuredData
.
Complete guide on JSON Schema can be found here.
Disabled by default.
You can use this standalone or in combination with structuredDataPrompt
. If both are provided, they are concatenated into appropriate instructions.
This is the type of output you'd like.
string
, number
, integer
, boolean
are the primitive types and should be obvious.
array
and object
are more interesting and quite powerful. They allow you to define nested structures.
For array
, you can define the schema of the items in the array using the items
property.
For object
, you can define the properties of the object using the properties
property.
This is required if the type is "array". This is the schema of the items in the array.
This is of type JsonSchema. However, Swagger doesn't support circular references.
This is required if the type is "object". This specifies the properties of the object.
This is a map of string to JsonSchema. However, Swagger doesn't support circular references.
This is the description to help the model understand what it needs to output.
This is a list of properties that are required.
This only makes sense if the type is "object".
This is the plan for artifacts generated during assistant's calls. Stored in call.artifact
.
Note: recordingEnabled
is currently at the root level. It will be moved to artifactPlan
in the future, but will remain backwards compatible.
This determines whether the video is recorded during the call. Default is false. Only relevant for webCall
type.
This is the S3 path prefix for the audio recording. This is only used if you have provided S3 credentials. Check the Providers page in the Dashboard.
If credential.s3PathPrefix is set, this will append to it.
Example: /my-prefix
. Default is /
.
This is the plan for static messages that can be spoken by the assistant during the call, like idleMessages
.
Note: firstMessage
, voicemailMessage
, and endCallMessage
are currently at the root level. They will be moved to messagePlan
in the future, but will remain backwards compatible.
This are the messages that the assistant will speak when the user hasn't responded for idleTimeoutSeconds
. Each time the timeout is triggered, a random message will be chosen from this array.
@default null (no idle message is spoken)
This determines the maximum number of times idleMessages
can be spoken during the call.
@default 3
This is the timeout in seconds before a message from idleMessages
is spoken. The clock starts when the assistant finishes speaking and remains active until the user speaks.
@default 10
This is the plan for when the assistant should start talking.
You should configure this if you're running into these issues:
- The assistant is too slow to start talking after the customer is done speaking.
- The assistant is too fast to start talking after the customer is done speaking.
- The assistant is so fast that it's actually interrupting the customer.
This is how long assistant waits before speaking. Defaults to 0.4.
This is the minimum it will wait but if there is latency is the pipeline, this minimum will be exceeded. This is really a stopgap in case the pipeline is moving too fast.
Example:
- If model generates tokens and voice generates bytes within 100ms, the pipeline still waits 300ms before outputting speech.
Usage:
- If the customer is taking long pauses, set this to a higher value.
- If the assistant is accidentally jumping in too much, set this to a higher value.
@default 0.4
This determines if a customer speech is considered done (endpointing) using the VAP model on customer's speech. This is good for middle-of-thought detection.
Once an endpoint is triggered, the request is sent to assistant.model
.
Default false
since experimental.
@default false
This determines how a customer speech is considered done (endpointing) using the transcription of customer's speech.
Once an endpoint is triggered, the request is sent to assistant.model
.
The minimum number of seconds to wait after transcription ending with punctuation before sending a request to the model. Defaults to 0.1.
This setting exists because the transcriber punctuates the transcription when it's more confident that customer has completed a thought.
@default 0.1
The minimum number of seconds to wait after transcription ending without punctuation before sending a request to the model. Defaults to 1.5.
This setting exists to catch the cases where the transcriber was not confident enough to punctuate the transcription, but the customer is done and has been silent for a long time.
@default 1.5
The minimum number of seconds to wait after transcription ending with a number before sending a request to the model. Defaults to 0.4.
This setting exists because the transcriber will sometimes punctuate the transcription ending with a number, even though the customer hasn't uttered the full number. This happens commonly for long numbers when the customer reads the number in chunks.
@default 0.5
This is the plan for when assistant should stop talking on customer interruption.
You should configure this if you're running into these issues:
- The assistant is too slow to recognize customer's interruption.
- The assistant is too fast to recognize customer's interruption.
- The assistant is getting interrupted by phrases that are just acknowledgments.
- The assistant is getting interrupted by background noises.
- The assistant is not properly stopping -- it starts talking right after getting interrupted.
This is the number of words that the customer has to say before the assistant will stop talking.
Words like "stop", "actually", "no", etc. will always interrupt immediately regardless of this value.
Words like "okay", "yeah", "right" will never interrupt.
When set to 0, voiceSeconds
is used in addition to the transcriptions to determine the customer has started speaking.
Defaults to 0.
@default 0
This is the seconds customer has to speak before the assistant stops talking. This uses the VAD (Voice Activity Detection) spike to determine if the customer has started speaking.
Considerations:
- A lower value might be more responsive but could potentially pick up non-speech sounds.
- A higher value reduces false positives but might slightly delay the detection of speech onset.
This is only used if numWords
is set to 0.
Defaults to 0.2
@default 0.2
This is the seconds to wait before the assistant will start talking again after being interrupted.
Defaults to 1.
@default 1
These are the credentials that will be used for the assistant calls. By default, all the credentials are available for use in the call but you can provide a subset using this.
This is the squad that will be used for the call. To use a transient squad, use squad
instead.
This is a squad that will be used for the call. To use an existing squad, use squadId
instead.
This is the name of the squad.
This is the list of assistants that make up the squad.
The call will start with the first assistant in the list.
This can be used to override all the assistants' settings and provide values for their template variables.
Both membersOverrides
and members[n].assistantOverrides
can be used together. First, members[n].assistantOverrides
is applied. Then, membersOverrides
is applied as a global override.
These are the options for the assistant's transcriber.
This is the transcription provider that will be used.
This is the Deepgram model that will be used. A list of models can be found here: https://developers.deepgram.com/docs/models-languages-overview
This is the language that will be set for the transcription. The list of languages Deepgram supports can be found here: https://developers.deepgram.com/docs/models-languages-overview
This will be use smart format option provided by Deepgram. It's default disabled because it can sometimes format numbers as times but it's getting better.
These keywords are passed to the transcription model to help it pick up use-case specific words. Anything that may not be a common word, like your company name, should be added here.
This is the timeout after which Deepgram will send transcription on user silence. You can read in-depth documentation here: https://developers.deepgram.com/docs/endpointing.
Here are the most important bits:
- Defaults to 10. This is recommended for most use cases to optimize for latency.
- 10 can cause some missing transcriptions since because of the shorter context. This mostly happens for one-word utterances. For those uses cases, it's recommended to try 300. It will add a bit of latency but the quality and reliability of the experience will be better.
- If neither 10 nor 300 work, contact [email protected] and we'll find another solution.
@default 10
These are the options for the assistant's LLM.
This is the starting state for the conversation.
These are the tools that the assistant can use during the call. To use existing tools, use toolIds
.
Both tools
and toolIds
can be used together.
These are the tools that the assistant can use during the call. To use transient tools, use tools
.
Both tools
and toolIds
can be used together.
This is the name of the model. Ex. cognitivecomputations/dolphin-mixtral-8x7b
This is the temperature that will be used for calls. Default is 0 to leverage caching for lower latency.
These are the options for the knowledge base.
This is the max number of tokens that the assistant will be allowed to generate in each turn of the conversation. Default is 250.
This determines whether we detect user's emotion while they speak and send it as an additional info to model.
Default false
because the model is usually are good at understanding the user's emotion from text.
@default false
This sets how many turns at the start of the conversation to use a smaller, faster model from the same provider before switching to the primary model. Example, gpt-3.5-turbo if provider is openai.
Default is 0.
@default 0
These are the options for the assistant's voice.
This determines whether fillers are injected into the model output before inputting it into the voice provider.
Default false
because you can achieve better results with prompting the model.
This is the voice provider that will be used.
This is the provider-specific ID that will be used.
This is the speed multiplier that will be used.
This is the plan for chunking the model output before it is sent to the voice provider.
This is the mode for the first message. Default is 'assistant-speaks-first'.
Use:
- 'assistant-speaks-first' to have the assistant speak first.
- 'assistant-waits-for-user' to have the assistant wait for the user to speak first.
- 'assistant-speaks-first-with-model-generated-message' to have the assistant speak first with a message generated by the model based on the conversation state. (
assistant.model.messages
at call start,call.messages
at squad transfer points).
@default 'assistant-speaks-first'
This sets whether the assistant's calls are recorded. Defaults to true.
When this is enabled, no logs, recordings, or transcriptions will be stored. At the end of the call, you will still receive an end-of-call-report message to store on your server. Defaults to false.
These are the messages that will be sent to your Client SDKs. Default is conversation-update,function-call,hang,model-output,speech-update,status-update,transcript,tool-calls,user-interrupted,voice-input. You can check the shape of the messages in ClientMessage schema.
These are the messages that will be sent to your Server URL. Default is conversation-update,end-of-call-report,function-call,hang,speech-update,status-update,tool-calls,transfer-destination-request,user-interrupted. You can check the shape of the messages in ServerMessage schema.
How many seconds of silence to wait before ending the call. Defaults to 30.
@default 30
This is the maximum number of seconds that the call will last. When the call reaches this duration, it will be ended.
@default 600 (10 minutes)
This is the background sound in the call. Default for phone calls is 'office' and default for web calls is 'off'.
This determines whether the model says 'mhmm', 'ahem' etc. while user is speaking.
Default false
while in beta.
@default false
This enables filtering of noise and background speech while the user is talking.
Default false
while in beta.
@default false
This determines whether the model's output is used in conversation history rather than the transcription of assistant's speech.
Default false
while in beta.
@default false
These are the configurations to be passed to the transport providers of assistant's calls, like Twilio. You can store multiple configurations for different transport providers. For a call, only the configuration matching the call transport provider is used.
These are values that will be used to replace the template variables in the assistant messages and other text-based fields.
This is the name of the assistant.
This is required when you want to transfer between assistants in a call.
This is the first message that the assistant will say. This can also be a URL to a containerized audio file (mp3, wav, etc.).
If unspecified, assistant will wait for user to speak and use the model to respond once they speak.
These are the settings to configure or disable voicemail detection. Alternatively, voicemail detection can be configured using the model.tools=[VoicemailTool]. This uses Twilio's built-in detection while the VoicemailTool relies on the model to detect if a voicemail was reached. You can use neither of them, one of them, or both of them. By default, Twilio built-in detection is enabled while VoicemailTool is not.
This is the provider to use for voicemail detection.
These are the AMD messages from Twilio that are considered as voicemail. Default is ['machine_end_beep', 'machine_end_silence'].
@default {Array} ['machine_end_beep', 'machine_end_silence']
This sets whether the assistant should detect voicemail. Defaults to true.
@default true
The number of seconds that Twilio should attempt to perform answering machine detection before timing out and returning AnsweredBy as unknown. Default is 30 seconds.
Increasing this value will provide the engine more time to make a determination. This can be useful when DetectMessageEnd is provided in the MachineDetection parameter and there is an expectation of long answering machine greetings that can exceed 30 seconds.
Decreasing this value will reduce the amount of time the engine has to make a determination. This can be particularly useful when the Enable option is provided in the MachineDetection parameter and you want to limit the time for initial detection.
Check the Twilio docs for more info.
@default 30
The number of milliseconds that is used as the measuring stick for the length of the speech activity. Durations lower than this value will be interpreted as a human, longer as a machine. Default is 2400 milliseconds.
Increasing this value will reduce the chance of a False Machine (detected machine, actually human) for a long human greeting (e.g., a business greeting) but increase the time it takes to detect a machine.
Decreasing this value will reduce the chances of a False Human (detected human, actually machine) for short voicemail greetings. The value of this parameter may need to be reduced by more than 1000ms to detect very short voicemail greetings. A reduction of that significance can result in increased False Machine detections. Adjusting the MachineDetectionSpeechEndThreshold is likely the better approach for short voicemails. Decreasing MachineDetectionSpeechThreshold will also reduce the time it takes to detect a machine.
Check the Twilio docs for more info.
@default 2400
The number of milliseconds of silence after speech activity at which point the speech activity is considered complete. Default is 1200 milliseconds.
Increasing this value will typically be used to better address the short voicemail greeting scenarios. For short voicemails, there is typically 1000-2000ms of audio followed by 1200-2400ms of silence and then additional audio before the beep. Increasing the MachineDetectionSpeechEndThreshold to ~2500ms will treat the 1200-2400ms of silence as a gap in the greeting but not the end of the greeting and will result in a machine detection. The downsides of such a change include:
- Increasing the delay for human detection by the amount you increase this parameter, e.g., a change of 1200ms to 2500ms increases human detection delay by 1300ms.
- Cases where a human has two utterances separated by a period of silence (e.g. a "Hello", then 2000ms of silence, and another "Hello") may be interpreted as a machine.
Decreasing this value will result in faster human detection. The consequence is that it can lead to increased False Human (detected human, actually machine) detections because a silence gap in a voicemail greeting (not necessarily just in short voicemail scenarios) can be incorrectly interpreted as the end of speech.
Check the Twilio docs for more info.
@default 1200
The number of milliseconds of initial silence after which an unknown AnsweredBy result will be returned. Default is 5000 milliseconds.
Increasing this value will result in waiting for a longer period of initial silence before returning an 'unknown' AMD result.
Decreasing this value will result in waiting for a shorter period of initial silence before returning an 'unknown' AMD result.
Check the Twilio docs for more info.
@default 5000
This is the message that the assistant will say if the call is forwarded to voicemail.
If unspecified, it will hang up.
This is the message that the assistant will say if it ends the call.
If unspecified, it will hang up without saying anything.
This list contains phrases that, if spoken by the assistant, will trigger the call to be hung up. Case insensitive.
This is for metadata you want to store on the assistant.
This is the URL Vapi will communicate with via HTTP GET and POST Requests. This is used for retrieving context, function calling, and end-of-call reports.
All requests will be sent with the call object among other things relevant to that message. You can find more details in the Server URL documentation.
This overrides the serverUrl set on the org and the phoneNumber. Order of precedence: tool.server.url > assistant.serverUrl > phoneNumber.serverUrl > org.serverUrl
This is the secret you can set that Vapi will send with every request to your server. Will be sent as a header called x-vapi-secret.
Same precedence logic as serverUrl.
This is the plan for analysis of assistant's calls. Stored in call.analysis
.
This is the prompt that's used to summarize the call. The output is stored in call.analysis.summary
.
Default is "You are an expert note-taker. You will be given a transcript of a call. Summarize the call in 2-3 sentences. DO NOT return anything except the summary.".
Set to '' or 'off' to disable.
This is how long the request is tried before giving up. When request times out, call.analysis.summary
will be empty. Increasing this timeout will delay the end of call report.
Default is 5 seconds.
This is how long the request is tried before giving up. When request times out, call.analysis.structuredData
will be empty. Increasing this timeout will delay the end of call report.
Default is 5 seconds.
This is the prompt that's used to evaluate if the call was successful. The output is stored in call.analysis.successEvaluation
.
Default is "You are an expert call evaluator. You will be given a transcript of a call and the system prompt of the AI participant. Determine if the call was successful based on the objectives inferred from the system prompt. DO NOT return anything except the result.".
Set to '' or 'off' to disable.
You can use this standalone or in combination with successEvaluationRubric
. If both are provided, they are concatenated into appropriate instructions.
This enforces the rubric of the evaluation. The output is stored in call.analysis.successEvaluation
.
Options include:
- 'NumericScale': A scale of 1 to 10.
- 'DescriptiveScale': A scale of Excellent, Good, Fair, Poor.
- 'Checklist': A checklist of criteria and their status.
- 'Matrix': A grid that evaluates multiple criteria across different performance levels.
- 'PercentageScale': A scale of 0% to 100%.
- 'LikertScale': A scale of Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree.
- 'AutomaticRubric': Automatically break down evaluation into several criteria, each with its own score.
- 'PassFail': A simple 'true' if call passed, 'false' if not.
For 'Checklist' and 'Matrix', provide the criteria in successEvaluationPrompt
.
Default is 'PassFail' if successEvaluationPrompt
is not provided, and null if successEvaluationPrompt
is provided.
You can use this standalone or in combination with successEvaluationPrompt
. If both are provided, they are concatenated into appropriate instructions.
This is how long the request is tried before giving up. When request times out, call.analysis.successEvaluation
will be empty. Increasing this timeout will delay the end of call report.
Default is 5 seconds.
This is the prompt that's used to extract structured data from the call. The output is stored in call.analysis.structuredData
.
Disabled by default.
You can use this standalone or in combination with structuredDataSchema
. If both are provided, they are concatenated into appropriate instructions.
This enforces the schema of the structured data. This output is stored in call.analysis.structuredData
.
Complete guide on JSON Schema can be found here.
Disabled by default.
You can use this standalone or in combination with structuredDataPrompt
. If both are provided, they are concatenated into appropriate instructions.
This is the plan for artifacts generated during assistant's calls. Stored in call.artifact
.
Note: recordingEnabled
is currently at the root level. It will be moved to artifactPlan
in the future, but will remain backwards compatible.
This determines whether the video is recorded during the call. Default is false. Only relevant for webCall
type.
This is the S3 path prefix for the audio recording. This is only used if you have provided S3 credentials. Check the Providers page in the Dashboard.
If credential.s3PathPrefix is set, this will append to it.
Example: /my-prefix
. Default is /
.
This is the plan for static messages that can be spoken by the assistant during the call, like idleMessages
.
Note: firstMessage
, voicemailMessage
, and endCallMessage
are currently at the root level. They will be moved to messagePlan
in the future, but will remain backwards compatible.
This are the messages that the assistant will speak when the user hasn't responded for idleTimeoutSeconds
. Each time the timeout is triggered, a random message will be chosen from this array.
@default null (no idle message is spoken)
This determines the maximum number of times idleMessages
can be spoken during the call.
@default 3
This is the timeout in seconds before a message from idleMessages
is spoken. The clock starts when the assistant finishes speaking and remains active until the user speaks.
@default 10
This is the plan for when the assistant should start talking.
You should configure this if you're running into these issues:
- The assistant is too slow to start talking after the customer is done speaking.
- The assistant is too fast to start talking after the customer is done speaking.
- The assistant is so fast that it's actually interrupting the customer.
This is how long assistant waits before speaking. Defaults to 0.4.
This is the minimum it will wait but if there is latency is the pipeline, this minimum will be exceeded. This is really a stopgap in case the pipeline is moving too fast.
Example:
- If model generates tokens and voice generates bytes within 100ms, the pipeline still waits 300ms before outputting speech.
Usage:
- If the customer is taking long pauses, set this to a higher value.
- If the assistant is accidentally jumping in too much, set this to a higher value.
@default 0.4
This determines if a customer speech is considered done (endpointing) using the VAP model on customer's speech. This is good for middle-of-thought detection.
Once an endpoint is triggered, the request is sent to assistant.model
.
Default false
since experimental.
@default false
This determines how a customer speech is considered done (endpointing) using the transcription of customer's speech.
Once an endpoint is triggered, the request is sent to assistant.model
.
This is the plan for when assistant should stop talking on customer interruption.
You should configure this if you're running into these issues:
- The assistant is too slow to recognize customer's interruption.
- The assistant is too fast to recognize customer's interruption.
- The assistant is getting interrupted by phrases that are just acknowledgments.
- The assistant is getting interrupted by background noises.
- The assistant is not properly stopping -- it starts talking right after getting interrupted.
This is the number of words that the customer has to say before the assistant will stop talking.
Words like "stop", "actually", "no", etc. will always interrupt immediately regardless of this value.
Words like "okay", "yeah", "right" will never interrupt.
When set to 0, voiceSeconds
is used in addition to the transcriptions to determine the customer has started speaking.
Defaults to 0.
@default 0
This is the seconds customer has to speak before the assistant stops talking. This uses the VAD (Voice Activity Detection) spike to determine if the customer has started speaking.
Considerations:
- A lower value might be more responsive but could potentially pick up non-speech sounds.
- A higher value reduces false positives but might slightly delay the detection of speech onset.
This is only used if numWords
is set to 0.
Defaults to 0.2
@default 0.2
This is the seconds to wait before the assistant will start talking again after being interrupted.
Defaults to 1.
@default 1
These are the credentials that will be used for the assistant calls. By default, all the credentials are available for use in the call but you can provide a subset using this.
This is the phone number that will be used for the call. To use a transient number, use phoneNumber
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the phone number that will be used for the call. To use an existing number, use phoneNumberId
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the fallback destination an inbound call will be transferred to if:
assistantId
is not setsquadId
is not set- and,
assistant-request
message to theserverUrl
fails
If this is not set and above conditions are met, the inbound call is hung up with an error message.
This is the flag to toggle the E164 check for the number
field. This is an advanced property which should be used if you know your use case requires it.
Use cases:
false
: To allow non-E164 numbers like+001234567890
,1234', or
abc`. This is useful for dialing out to non-E164 numbers on your SIP trunks.true
(default): To allow only E164 numbers like+14155551234
. This is for most standard PSTN calls.
If false
, the number
is still required to only contain alphanumeric characters (regex: /^\+?[a-zA-Z0-9]+$/
).
@default true (E164 check is enabled)
This is the phone number to transfer the call to.
This is the extension to dial after transferring the call to the number
.
This is the message to say before transferring the call to the destination.
If this is not provided and transfer tool messages is not provided, default is "Transferring the call now".
If set to "", nothing is spoken. This is useful when you want to silently transfer. This is especially useful when transferring between assistants in a squad. In this scenario, you likely also want to set assistant.firstMessageMode=assistant-speaks-first-with-model-generated-message
for the destination assistant.
This is the description of the destination, used by the AI to choose when and how to transfer the call.
These are the digits of the phone number you own on your Twilio.
This is your Twilio Account SID that will be used to handle this phone number.
This is the Twilio Auth Token that will be used to handle this phone number.
This is the name of the phone number. This is just for your own reference.
This is the assistant that will be used for incoming calls to this phone number.
If neither assistantId
nor squadId
is set, assistant-request
will be sent to your Server URL. Check ServerMessage
and ServerMessageResponse
for the shape of the message and response that is expected.
This is the squad that will be used for incoming calls to this phone number.
If neither assistantId
nor squadId
is set, assistant-request
will be sent to your Server URL. Check ServerMessage
and ServerMessageResponse
for the shape of the message and response that is expected.
This is the server URL where messages will be sent for calls on this number. This includes the assistant-request
message.
You can see the shape of the messages sent in ServerMessage
.
This overrides the org.serverUrl
. Order of precedence: tool.server.url > assistant.serverUrl > phoneNumber.serverUrl > org.serverUrl.
This is the secret Vapi will send with every message to your server. It's sent as a header called x-vapi-secret.
Same precedence logic as serverUrl.
This is the customer that will be called. To call a transient customer , use customer
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the customer that will be called. To call an existing customer, use customerId
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the flag to toggle the E164 check for the number
field. This is an advanced property which should be used if you know your use case requires it.
Use cases:
false
: To allow non-E164 numbers like+001234567890
,1234', or
abc`. This is useful for dialing out to non-E164 numbers on your SIP trunks.true
(default): To allow only E164 numbers like+14155551234
. This is for most standard PSTN calls.
If false
, the number
is still required to only contain alphanumeric characters (regex: /^\+?[a-zA-Z0-9]+$/
).
@default true (E164 check is enabled)
This is the extension that will be dialed after the call is answered.
This is the number of the customer.
This is the SIP URI of the customer.
This is the name of the customer. This is just for your own reference.
For SIP inbound calls, this is extracted from the From
SIP header with format "Display Name" <sip:username@domain>
.
Authorizations
Retrieve your API Key from Dashboard.
Body
This is the name of the call. This is just for your own reference.
This is the assistant that will be used for the call. To use a transient assistant, use assistant
instead.
This is the assistant that will be used for the call. To use an existing assistant, use assistantId
instead.
These are the overrides for the assistant
or assistantId
's settings and template variables.
This is the squad that will be used for the call. To use a transient squad, use squad
instead.
This is a squad that will be used for the call. To use an existing squad, use squadId
instead.
This is the phone number that will be used for the call. To use a transient number, use phoneNumber
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the phone number that will be used for the call. To use an existing number, use phoneNumberId
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the customer that will be called. To call a transient customer , use customer
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the customer that will be called. To call an existing customer, use customerId
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
Response
This is the type of call.
inboundPhoneCall
, outboundPhoneCall
, webCall
These are the messages that were spoken during the call.
This is the provider of the call.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
twilio
, vonage
, vapi
This is the transport of the phone call.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
sip
, pstn
This is the status of the call.
queued
, ringing
, in-progress
, forwarding
, ended
This is the explanation for how the call ended.
assistant-error
, assistant-not-found
, db-error
, no-server-available
, license-check-failed
, pipeline-error-openai-llm-failed
, pipeline-error-azure-openai-llm-failed
, pipeline-error-groq-llm-failed
, pipeline-error-openai-voice-failed
, pipeline-error-cartesia-voice-failed
, pipeline-error-deepgram-transcriber-failed
, pipeline-error-deepgram-voice-failed
, pipeline-error-gladia-transcriber-failed
, pipeline-error-eleven-labs-voice-failed
, pipeline-error-playht-voice-failed
, pipeline-error-lmnt-voice-failed
, pipeline-error-azure-voice-failed
, pipeline-error-rime-ai-voice-failed
, pipeline-error-neets-voice-failed
, pipeline-no-available-model
, worker-shutdown
, twilio-failed-to-connect-call
, unknown-error
, vonage-disconnected
, vonage-failed-to-connect-call
, phone-call-provider-bypass-enabled-but-no-call-received
, vapi-error-phone-call-worker-setup-socket-error
, vapi-error-phone-call-worker-worker-setup-socket-timeout
, vapi-error-phone-call-worker-could-not-find-call
, vapi-error-phone-call-worker-call-never-connected
, vapi-error-web-call-worker-setup-failed
, assistant-not-invalid
, assistant-not-provided
, call-start-error-neither-assistant-nor-server-set
, assistant-request-failed
, assistant-request-returned-error
, assistant-request-returned-unspeakable-error
, assistant-request-returned-invalid-assistant
, assistant-request-returned-no-assistant
, assistant-request-returned-forwarding-phone-number
, assistant-ended-call
, assistant-said-end-call-phrase
, assistant-forwarded-call
, assistant-join-timed-out
, customer-busy
, customer-ended-call
, customer-did-not-answer
, customer-did-not-give-microphone-permission
, assistant-said-message-with-end-call-enabled
, exceeded-max-duration
, manually-canceled
, phone-call-provider-closed-websocket
, pipeline-error-anthropic-llm-failed
, pipeline-error-together-ai-llm-failed
, pipeline-error-anyscale-llm-failed
, pipeline-error-openrouter-llm-failed
, pipeline-error-perplexity-ai-llm-failed
, pipeline-error-deepinfra-llm-failed
, pipeline-error-runpod-llm-failed
, pipeline-error-custom-llm-llm-failed
, pipeline-error-eleven-labs-voice-not-found
, pipeline-error-eleven-labs-quota-exceeded
, pipeline-error-eleven-labs-unauthorized-access
, pipeline-error-eleven-labs-unauthorized-to-access-model
, pipeline-error-eleven-labs-professional-voices-only-for-creator-plus
, pipeline-error-eleven-labs-blocked-free-plan-and-requested-upgrade
, pipeline-error-eleven-labs-blocked-concurrent-requests-and-requested-upgrade
, pipeline-error-eleven-labs-blocked-using-instant-voice-clone-and-requested-upgrade
, pipeline-error-eleven-labs-system-busy-and-requested-upgrade
, pipeline-error-eleven-labs-voice-not-fine-tuned
, pipeline-error-eleven-labs-invalid-api-key
, pipeline-error-eleven-labs-invalid-voice-samples
, pipeline-error-eleven-labs-voice-disabled-by-owner
, pipeline-error-eleven-labs-blocked-account-in-probation
, pipeline-error-eleven-labs-blocked-content-against-their-policy
, pipeline-error-eleven-labs-missing-samples-for-voice-clone
, pipeline-error-playht-request-timed-out
, pipeline-error-playht-invalid-voice
, pipeline-error-playht-unexpected-error
, pipeline-error-playht-out-of-credits
, pipeline-error-playht-rate-limit-exceeded
, pipeline-error-playht-502-gateway-error
, pipeline-error-playht-504-gateway-error
, pipeline-error-gladia-transcriber-failed
, sip-gateway-failed-to-connect-call
, silence-timed-out
, voicemail
, vonage-rejected
This is the destination where the call ended up being transferred to. If the call was not transferred, this will be empty.
This is the unique identifier for the call.
This is the unique identifier for the org that this call belongs to.
This is the ISO 8601 date-time string of when the call was created.
This is the ISO 8601 date-time string of when the call was last updated.
This is the ISO 8601 date-time string of when the call was started.
This is the ISO 8601 date-time string of when the call was ended.
This is the cost of the call in USD.
This is the cost of the call in USD.
These are the costs of individual components of the call in USD.
This is the transcript of the call.
This is the URL of the recording of the call.
This is the URL of the recording of the call in two channels.
This stores artifacts of the call. Customize what artifacts are created by configuring assistant.artifactPlan
.
This is a copy of assistant artifact plan. This isn't actually stored on the call but rather just returned in POST /call/web to enable artifact creation client side.
This is the analysis of the call. Customize the analysis by configuring assistant.analysisPlan
.
The ID of the call as provided by the phone number service. callSid in Twilio. conversationUuid in Vonage.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the assistant that will be used for the call. To use a transient assistant, use assistant
instead.
This is the assistant that will be used for the call. To use an existing assistant, use assistantId
instead.
These are the overrides for the assistant
or assistantId
's settings and template variables.
This is the squad that will be used for the call. To use a transient squad, use squad
instead.
This is a squad that will be used for the call. To use an existing squad, use squadId
instead.
This is the phone number that will be used for the call. To use a transient number, use phoneNumber
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the phone number that will be used for the call. To use an existing number, use phoneNumberId
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the customer that will be called. To call a transient customer , use customer
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the customer that will be called. To call an existing customer, use customerId
instead.
Only relevant for outboundPhoneCall
and inboundPhoneCall
type.
This is the name of the call. This is just for your own reference.
Was this page helpful?