Evaluation System Foundation
-
Evaluation Framework: You can now systematically test your Vapi voice assistants with the new
Eval
system. Create comprehensive test scenarios to validate assistant behavior, conversation flow, and tool usage through mock conversations. -
Mock Conversation Builder: Design test conversations using
Eval.messages
with support for multiple message types:ChatEvalUserMessageMock
: Simulate user inputs and questionsChatEvalSystemMessageMock
: Inject system messages mid-conversationChatEvalToolResponseMessageMock
: Mock tool responses for consistent testingChatEvalAssistantMessageEvaluation
: Define evaluation checkpoints
-
Evaluation Types: Currently focused on
chat.mockConversation
type evaluations, with the framework designed to support additional evaluation methods in future releases. -
Evaluation Management: Organize your tests with
CreateEvalDTO
andUpdateEvalDTO
:name
: Descriptive names up to 80 characters (e.g., “Customer Support Flow Validation”)description
: Detailed descriptions up to 500 characters explaining the test purposemessages
: The complete mock conversation flow
-
Evaluation Endpoints: Access your evaluations through the new
/eval
endpoint family:GET /eval
: List all evaluations with pagination supportPOST /eval
: Create new evaluationsGET /eval/{id}
: Retrieve specific evaluation detailsPUT /eval/{id}
: Update existing evaluations
-
Judge Plan Architecture: Define how assistant responses are validated using
AssistantMessageJudgePlan
with three evaluation methods:- Exact Match:
AssistantMessageJudgePlanExact
for precise content and tool call validation - Regex Pattern:
AssistantMessageJudgePlanRegex
for flexible pattern-based evaluation - AI Judge:
AssistantMessageJudgePlanAI
for intelligent evaluation using LLM-as-a-judge
- Exact Match:
This is the foundation release for the evaluation system. Evaluation execution and results processing will be available in upcoming releases. Start designing your test scenarios now to be ready for full evaluation capabilities.
Testing Capabilities
Create realistic test scenarios with user messages, system prompts, and expected assistant responses for comprehensive flow validation.
Validate that your assistant calls the right tools with correct parameters using ChatEvalAssistantMessageMockToolCall
.
Choose from exact matching, regex patterns, or AI-powered evaluation to suit different testing needs and complexity levels.
Organize tests with descriptive names and detailed documentation to maintain clear testing workflows across your team.