September 5, 2025

Evaluation System Foundation

Evaluation Framework: You can now systematically test your Vapi voice assistants with the new Eval system. Create comprehensive test scenarios to validate assistant behavior, conversation flow, and tool usage through mock conversations.
Mock Conversation Builder: Design test conversations using Eval.messages with support for multiple message types:
- ChatEvalUserMessageMock: Simulate user inputs and questions
- ChatEvalSystemMessageMock: Inject system messages mid-conversation
- ChatEvalToolResponseMessageMock: Mock tool responses for consistent testing
- ChatEvalAssistantMessageEvaluation: Define evaluation checkpoints
Evaluation Types: Currently focused on chat.mockConversation type evaluations, with the framework designed to support additional evaluation methods in future releases.
Evaluation Management: Organize your tests with CreateEvalDTO and UpdateEvalDTO:
- name: Descriptive names up to 80 characters (e.g., “Customer Support Flow Validation”)
- description: Detailed descriptions up to 500 characters explaining the test purpose
- messages: The complete mock conversation flow
Evaluation Endpoints: Access your evaluations through the new /eval endpoint family:
- GET /eval: List all evaluations with pagination support
- POST /eval: Create new evaluations
- GET /eval/{id}: Retrieve specific evaluation details
- PUT /eval/{id}: Update existing evaluations
Judge Plan Architecture: Define how assistant responses are validated using AssistantMessageJudgePlan with three evaluation methods:
- Exact Match: AssistantMessageJudgePlanExact for precise content and tool call validation
- Regex Pattern: AssistantMessageJudgePlanRegex for flexible pattern-based evaluation
- AI Judge: AssistantMessageJudgePlanAI for intelligent evaluation using LLM-as-a-judge

This is the foundation release for the evaluation system. Evaluation execution and results processing will be available in upcoming releases. Start designing your test scenarios now to be ready for full evaluation capabilities.

Testing Capabilities

Mock Conversations

Create realistic test scenarios with user messages, system prompts, and expected assistant responses for comprehensive flow validation.

Tool Call Testing

Validate that your assistant calls the right tools with correct parameters using ChatEvalAssistantMessageMockToolCall.

Flexible Validation

Choose from exact matching, regex patterns, or AI-powered evaluation to suit different testing needs and complexity levels.

Evaluation Organization

Organize tests with descriptive names and detailed documentation to maintain clear testing workflows across your team.