Custom Knowledge Base
Overview
Custom Knowledge Bases allow you to implement your own document retrieval server, giving you complete control over how your assistant searches and retrieves information. Instead of relying on Vapi’s built-in knowledge base providers, you can integrate your own search infrastructure, vector databases, or custom retrieval logic.
With Custom Knowledge Bases, you can:
- Use your own vector database or search infrastructure
- Implement custom retrieval algorithms and scoring
- Integrate with existing document management systems
- Apply custom business logic to document filtering
- Maintain full control over data security and privacy
How Custom Knowledge Bases Work
Custom Knowledge Bases operate through a webhook-style integration where Vapi forwards search requests to your server and expects structured responses containing relevant documents.
User asks assistant a question during conversation
Vapi sends search request to your custom endpoint
Your server returns relevant documents or direct response
Creating a Custom Knowledge Base
Step 1: Create the Knowledge Base
Use the Vapi API to create a custom knowledge base configuration:
Step 2: Attach to Your Assistant
Custom knowledge bases can only be attached to assistants via the API. This functionality is not available through the dashboard interface.
To attach a custom knowledge base to your assistant, update the assistant’s model configuration. You must provide the complete model configuration including all existing messages, as partial patches are not supported for nested objects:
When updating an assistant’s model, you must include the complete model object including all existing messages and configuration. The API replaces the entire model object and doesn’t support partial updates for nested objects.
Implementing the Custom Endpoint
Your custom knowledge base server must handle POST requests at the configured URL and return structured responses.
Request Structure
Vapi will send requests to your endpoint with the following structure:
Response Options
Your endpoint can respond in two ways:
Option 1: Return Documents for AI Processing
Return an array of relevant documents that the AI will use to formulate a response:
Option 2: Return Direct Response
Return a complete response that the assistant will speak directly:
Implementation Examples
Here are complete server implementations in different languages:
Advanced Implementation Patterns
Vector Database Integration
For production use, integrate with a proper vector database:
Security and Best Practices
Performance Optimization
Response time is critical: Your endpoint should respond in milliseconds (ideally under ~50ms) for optimal user experience. While Vapi allows up to 10 seconds timeout, slower responses will significantly affect your assistant’s conversational flow and response quality.
Cache frequently requested documents and implement request timeouts to ensure fast response times. Consider using in-memory caches, CDNs, or pre-computed embeddings for faster retrieval.
Error Handling
Always handle errors gracefully and return appropriate HTTP status codes:
Next Steps
Now that you have a custom knowledge base implementation:
- Query Tool Configuration: Learn advanced query tool configurations
- Vector Databases: Explore vector database integrations
- Assistant Configuration: Optimize your assistant’s use of knowledge bases
Custom Knowledge Bases require a webhook endpoint that’s publicly accessible. For production deployments, ensure your server can handle concurrent requests and has appropriate error handling and monitoring in place.