Simulations quickstart

Pre-release

Test your AI assistants with realistic AI-powered callers

Overview

This quickstart guide will help you test your AI assistants and squads using realistic, AI-powered callers. In just a few minutes, you’ll create test scenarios, define evaluation criteria, and validate your agents work correctly under different conditions.

What are Simulations?

Simulations is Vapi’s voice agent testing framework that enables you to systematically test assistants and squads using AI-powered callers that follow defined instructions and evaluate outcomes using structured outputs. Instead of relying on manual testing or rigid scripts, Simulations recreate real conversations and measure whether your assistant behaves correctly. Test your agents by:

  1. Creating personalities - Define a full assistant configuration for the AI tester (voice, model, system prompt)
  2. Defining scenarios - Specify instructions for the tester and evaluations using structured outputs
  3. Creating simulations - Pair scenarios with personalities
  4. Running simulations - Execute tests against your assistant or squad in voice or chat mode
  5. Reviewing results - Analyze pass/fail outcomes based on structured output evaluations

When are Simulations useful?

Simulations help you maintain quality and catch issues early:

  • Pre-deployment testing - Validate new assistant configurations before going live
  • Regression testing - Ensure prompt or tool changes don’t break existing behaviors
  • Conversation flow validation - Test multi-turn interactions and complex scenarios
  • Personality-based testing - Verify your agent handles different caller types appropriately
  • Squad handoff testing - Ensure smooth transitions between squad members
  • Performance monitoring - Track success rates over time and identify regressions

Voice vs Chat mode

Simulations support two transport modes:

Voice mode
  • Full voice simulation with audio
  • Realistic end-to-end testing
  • Tests speech recognition and synthesis
  • Produces call recordings
Chat mode
  • Text-based chat simulation
  • Faster execution
  • Lower cost (no audio processing)
  • Ideal for rapid iteration

Use chat mode during development for quick iteration, then switch to voice mode for final validation before deployment.

What you’ll build

A simulation suite for an appointment booking assistant that tests:

  • Different caller personalities (confused user, impatient customer)
  • Evaluation criteria using structured outputs with comparators
  • Real-time monitoring of test runs
  • Both voice and chat mode execution

Prerequisites

Vapi account
API key

Get your API key from API Keys in sidebar

You’ll also need an existing assistant or squad to test. You can create one in the Dashboard or use the API.

Step 1: Create a personality

Personalities define how the AI tester behaves during a simulation. A personality is a full assistant configuration that controls the tester’s voice, model, and behavior via system prompt.

2

Create a personality

  1. Click Create Personality
  2. Name: Enter “Impatient Customer”
  3. Assistant Configuration: Configure the tester assistant:
    • Model: Select your preferred LLM (e.g., GPT-4o)
    • System Prompt: Define the personality behavior:
      You are an impatient customer who wants quick answers.
      Speak directly and may interrupt if responses are too long.
      You expect immediate solutions to your problems.
    • Voice: Select a voice for the tester (optional for chat mode)
  4. Click Save

Start with the built-in default personalities to get familiar with the system before creating custom ones.

Personality types: Consider creating personalities for different customer types you encounter: decisive buyers, confused users, detail-oriented customers, or frustrated callers.

Step 2: Create a scenario

Scenarios define what the test is evaluating. A scenario contains:

  • Instructions: What the tester should do during the call
  • Evaluations: Structured outputs with expected values to validate outcomes
2

Configure the scenario

  1. Name: Enter “Book Appointment”
  2. Instructions: Define what the tester should do:
    You are calling to book an appointment for next Monday at 2pm.
    Confirm your identity when asked and provide any required information.
    End the call once you receive a confirmation number.
3

Add evaluations

Evaluations use structured outputs to extract data from the conversation and compare against expected values.

  1. Click Add Evaluation
  2. Create or select a structured output:
    • Name: “appointment_booked”
    • Schema Type: boolean
  3. Set the Comparator: =
  4. Set the Expected Value: true
  5. Mark as Required: Yes
  6. Add another evaluation for confirmation number:
    • Name: “confirmation_provided”
    • Schema Type: boolean
    • Comparator: =
    • Expected Value: true
  7. Click Save Scenario

Evaluation structure

Each evaluation consists of:

FieldDescription
structuredOutputIdReference to an existing structured output (mutually exclusive with structuredOutput)
structuredOutputInline structured output definition (mutually exclusive with structuredOutputId)
comparatorComparison operator: =, !=, >, <, >=, <=
valueExpected value (string, number, or boolean)
requiredWhether this evaluation must pass for the simulation to pass (default: true)

Schema type restrictions: Evaluations only support primitive schema types: string, number, integer, boolean. Objects and arrays are not supported.

Comparator options

ComparatorDescriptionSupported Types
=Equalsstring, number, integer, boolean
!=Not equalsstring, number, integer, boolean
>Greater thannumber, integer
<Less thannumber, integer
>=Greater than or equalnumber, integer
<=Less than or equalnumber, integer

Evaluation tips: Use boolean structured outputs for pass/fail checks like “appointment_booked” or “issue_resolved”. Use numeric outputs with comparators for metrics like “satisfaction_score >= 4”.

Step 3: Create a simulation

Simulations pair a scenario with a personality. The target assistant or squad is specified when you run the simulation.

2

Configure the simulation

  1. Name: Enter “Appointment Booking - Impatient Customer” (optional)
  2. Scenario: Select “Book Appointment” from the dropdown
  3. Personality: Select “Impatient Customer” from the dropdown
  4. Click Save Simulation

Multiple simulations: Create several simulations with different personality and scenario combinations to thoroughly test your assistant across various conditions.

Step 4: Create a simulation suite (optional)

Simulation suites group multiple simulations into a single batch that runs together.

2

Configure the suite

  1. Name: Enter “Appointment Booking Regression Suite”
  2. Click Add Simulations
  3. Select the simulations you want to include:
    • “Appointment Booking - Impatient Customer”
    • “Appointment Booking - Confused User”
    • “Appointment Booking - Decisive Customer”
  4. Click Save Suite

Suite organization: Group related simulations together. For example, create separate suites for “Booking Tests”, “Cancellation Tests”, and “Rescheduling Tests”.

Step 5: Run a simulation

Execute simulations against your assistant or squad. You can run individual simulations or entire suites.

1

Start a run

  1. Navigate to your simulation or suite
  2. Click Run
  3. Select the Target:
    • Choose Assistant or Squad
    • Select from the dropdown
  4. Configure Transport (optional):
    • Voice: vapi.websocket (default)
    • Chat: vapi.webchat (faster, no audio)
  5. Set Iterations (optional): Number of times to run each simulation
  6. Click Start Run
2

Monitor progress

  1. Click the Runs tab to see live status updates
  2. Watch as each simulation progresses:
    • Queued - Waiting to start
    • Running - Test in progress
    • Ended - Test finished
  3. For voice mode, click Listen on any running test to hear the call live

Step 6: Review results

Analyze the results of your simulation runs to understand how your assistant performed.

Successful run

When all evaluations pass, you’ll see:

1{
2 "id": "550e8400-e29b-41d4-a716-446655440007",
3 "status": "ended",
4 "itemCounts": {
5 "total": 3,
6 "passed": 3,
7 "failed": 0,
8 "running": 0,
9 "queued": 0,
10 "canceled": 0
11 },
12 "startedAt": "2024-01-15T09:50:05Z",
13 "endedAt": "2024-01-15T09:52:30Z"
14}

Pass criteria:

  • status is “ended”
  • itemCounts.passed equals itemCounts.total
  • All required evaluations show passed: true

Failed run

When evaluation fails, you’ll see details about what went wrong:

1{
2 "id": "550e8400-e29b-41d4-a716-446655440008",
3 "status": "ended",
4 "itemCounts": {
5 "total": 3,
6 "passed": 2,
7 "failed": 1,
8 "running": 0,
9 "queued": 0,
10 "canceled": 0
11 }
12}

Failure indicators:

  • itemCounts.failed > 0
  • Individual run items show which evaluations failed and why
1

View run results

  1. Navigate to the Runs tab
  2. Click on a completed run to see details
  3. View the summary showing pass/fail counts
2

Investigate failures

  1. Click on any failed simulation
  2. Review the Conversation to see the full transcript
  3. Check which evaluations failed and their actual vs expected values
  4. For voice mode, click Listen to Recording to hear the full call
3

Track performance over time

  1. Go to the main Simulations page
  2. View historical runs and their pass rates
  3. Monitor trends to identify regressions

Full conversation transcripts are available for all simulation runs, making it easy to understand exactly what happened during each test.

Next steps

Tips for success

Best practices for effective simulation testing:

  • Start with chat mode - Use vapi.webchat for rapid iteration, then validate with voice
  • Use realistic personalities - Model your test callers after actual customer types
  • Define clear evaluations - Use specific, measurable structured outputs
  • Group related tests - Organize suites by feature or user flow
  • Monitor trends - Track pass rates over time to catch regressions early
  • Test after changes - Run your simulation suites after updating prompts or tools
  • Listen to recordings - Audio recordings reveal issues that metrics alone miss
  • Iterate on failures - Use failed tests to improve both your assistant and test design

Frequently asked questions

Simulation concurrency follows your organization’s call concurrency limits. Each voice simulation uses 2 concurrent call slots (one for the AI tester, one for your assistant being tested). Chat mode simulations are more efficient since they don’t require audio processing. If you need higher concurrency limits, contact support.

Simulations use AI-powered testers that have actual conversations with your assistant, producing real call recordings and transcripts. Evals use mock conversations with predefined messages and judge the responses. Use Simulations for realistic end-to-end testing; use Evals for faster, more controlled validation.

Yes! You can either define inline structured outputs in your scenario evaluations, or reference existing structured outputs by ID using the structuredOutputId field.

Create a simulation that targets a squad instead of an assistant. Use the target.type: "squad" and target.squadId fields when creating a run.

Get help

Need assistance? We’re here to help: