Overview
This guide covers advanced simulation strategies, testing patterns, and best practices for building robust test suites that ensure your AI voice agents work reliably in production.
You’ll learn:
Advanced scenario configuration (tool mocks, hooks)
Strategic testing approaches (smoke, regression, edge cases)
Performance optimization techniques
CI/CD integration strategies
Maintenance and troubleshooting methods
Advanced scenario configuration
Mock tool call responses at the scenario level to test specific paths without calling real APIs. This is useful for:
Testing error handling paths
Simulating unavailable services
Deterministic test results
Faster test execution (no real API calls)
1 Navigate to scenario
Go to Simulations → Scenarios
Open the scenario you want to configure
Common tool mock patterns:
Success response Error response Timeout/unavailable Partial success
Tool mock tips:
Mock tool names must exactly match the function name configured in your assistant’s tools
Use realistic error responses that match your actual API error formats
Create separate scenarios for success paths and error paths
Disable mocks (enabled: false) to test against real APIs
Simulation hooks
Trigger actions on simulation lifecycle events. Hooks are useful for:
Notifying external systems when tests start/end
Logging test execution to your own systems
Triggering follow-up workflows
Custom analytics and reporting
Hooks are only supported in voice mode. Hooks require vapi.websocket transport and will not trigger with vapi.webchat (chat mode).
1 Add hooks to scenario
Go to Simulations → Scenarios
Open your scenario
Scroll to Hooks section
Click Add Hook
Webhook payload examples:
Using existing structured outputs
Instead of defining inline structured outputs in each scenario, you can reference structured outputs you’ve already created. This provides:
Reusability across multiple scenarios
Centralized management of evaluation criteria
Consistency in how data is extracted
Go to Structured Outputs in the sidebar
Create a new structured output or find an existing one
Copy the ID
In your scenario, select Use Existing when adding an evaluation
Paste the structured output ID
When to use existing vs inline:
Existing (by ID) : When the same evaluation criteria is used across multiple scenarios
Inline : For scenario-specific evaluations that won’t be reused
Testing strategies
Smoke tests
Quick validation that core functionality works. Run these first to catch obvious issues.
Purpose: Verify your assistant responds and basic conversation flow works before running comprehensive tests.
Characteristics:
Minimal evaluation criteria (just check for any response)
Fast execution (simple instructions)
Run before detailed tests
Use chat mode for speed
When to use:
Before running expensive voice test suites
After deploying configuration changes
As health checks in monitoring
Quick validation during development
Regression tests
Ensure fixes and updates don’t break existing functionality.
Purpose: Validate that known issues stay fixed and features keep working.
Name scenarios with “Regression: ” prefix
Include issue ticket number in the name
Add the exact scenario that previously failed
Document what was fixed
Example:
Name: “Regression: Appointment Parsing Bug #1234”
Instructions: Scenario that triggered the bug
Best practices:
Name tests after bugs they prevent
Include ticket/issue numbers
Add regression tests when fixing bugs
Run full regression suite before major releases
Edge case testing
Test boundary conditions and unusual inputs your assistant might encounter.
Common edge cases to test:
Confused or unclear requests Rapid topic changes Interruptions
This edge case requires voice mode (vapi.websocket) to test actual audio interruptions.Invalid data input
Edge case categories to cover:
Input boundaries: Empty, maximum length, special characters
Data formats: Invalid dates, malformed phone numbers, unusual names
Conversation patterns: Interruptions, topic changes, contradictions
Emotional scenarios: Frustrated caller, confused caller, impatient caller
Best practices
Evaluation design principles
Single responsibility
Each evaluation should test one specific outcome.
✅ Good: “Was the appointment booked?”
❌ Bad: “Was the appointment booked, confirmed, and email sent?”
Clear naming
Use descriptive names that explain what’s being tested.
✅ Good: “Booking - Handles Unavailable Slot”
❌ Bad: “Test 1” or “Scenario ABC”
Realistic personalities
Model test personalities after actual customer types.
Consider: decisive, confused, impatient, detail-oriented, non-native speakers
Measurable criteria
Use boolean or numeric structured outputs that produce clear pass/fail results.
Avoid subjective criteria that are hard to evaluate consistently.
Choosing voice vs chat mode
CI/CD integration
Automate simulation runs in your deployment pipeline.
Basic workflow
Advanced patterns
Staging validation before production Run full simulation suite against staging before promoting to production:
Scheduled nightly regression Run full regression suite nightly:
Quality gates Block deployment if pass rate falls below threshold:
Maintenance strategies
Regular review cycle
1 Weekly: Review failed tests Investigate all failures. Update tests if requirements changed, or fix assistant if behavior regressed.
2 Monthly: Audit test coverage Review simulation suite completeness:
All critical user flows covered?
New features have tests?
Deprecated features removed?
3 Quarterly: Refactor and optimize
Remove duplicate simulations
Update outdated scenarios
Optimize personalities for cost
Document test rationale
When to update simulations
Troubleshooting
Common issues
Debugging tips
2 Review individual run items 3 Check conversation transcript In the Dashboard, click on a failed run item to see the full conversation transcript and evaluation results.
4 Test assistant manually If simulations consistently fail, test your assistant manually in the Dashboard to verify it’s working correctly.
Getting help
Include these details when reporting issues:
Simulation run ID
Scenario and personality IDs
Transport mode used (voice/chat)
Expected vs actual behavior
Assistant configuration
Resources:
Next steps
Summary
Key takeaways for advanced simulation testing:
Configuration:
Use tool mocks to test error paths without real API calls
Use hooks for external notifications (voice mode only)
Reference existing structured outputs for consistency
Testing strategy:
Start with smoke tests, then regression, then edge cases
Use chat mode for speed, voice mode for final validation
Create personalities based on real customer types
CI/CD:
Automate smoke tests in PR pipelines
Run full regression before production deploys
Set quality gate thresholds
Maintenance:
Review failures weekly
Audit coverage monthly
Add regression tests when fixing bugs