Evaluation Framework: You can now systematically test your Vapi voice assistants with the new Eval system. Create comprehensive test scenarios to validate assistant behavior, conversation flow, and tool usage through mock conversations.
Mock Conversation Builder: Design test conversations using Eval.messages with support for multiple message types:
ChatEvalUserMessageMock: Simulate user inputs and questionsChatEvalSystemMessageMock: Inject system messages mid-conversationChatEvalToolResponseMessageMock: Mock tool responses for consistent testingChatEvalAssistantMessageEvaluation: Define evaluation checkpointsEvaluation Types: Currently focused on chat.mockConversation type evaluations, with the framework designed to support additional evaluation methods in future releases.
Evaluation Management: Organize your tests with CreateEvalDTO and UpdateEvalDTO:
name: Descriptive names up to 80 characters (e.g., “Customer Support Flow Validation”)description: Detailed descriptions up to 500 characters explaining the test purposemessages: The complete mock conversation flowEvaluation Endpoints: Access your evaluations through the new /eval endpoint family:
GET /eval: List all evaluations with pagination supportPOST /eval: Create new evaluationsGET /eval/{id}: Retrieve specific evaluation detailsPUT /eval/{id}: Update existing evaluationsJudge Plan Architecture: Define how assistant responses are validated using AssistantMessageJudgePlan with three evaluation methods:
AssistantMessageJudgePlanExact for precise content and tool call validationAssistantMessageJudgePlanRegex for flexible pattern-based evaluationAssistantMessageJudgePlanAI for intelligent evaluation using LLM-as-a-judgeThis is the foundation release for the evaluation system. Evaluation execution and results processing will be available in upcoming releases. Start designing your test scenarios now to be ready for full evaluation capabilities.
Create realistic test scenarios with user messages, system prompts, and expected assistant responses for comprehensive flow validation.
Validate that your assistant calls the right tools with correct parameters using ChatEvalAssistantMessageMockToolCall.
Choose from exact matching, regex patterns, or AI-powered evaluation to suit different testing needs and complexity levels.
Organize tests with descriptive names and detailed documentation to maintain clear testing workflows across your team.