Custom Knowledge Bases allow you to implement your own document retrieval server, giving you complete control over how your assistant searches and retrieves information. Instead of relying on Vapi’s built-in knowledge base providers, you can integrate your own search infrastructure, vector databases, or custom retrieval logic.
With Custom Knowledge Bases, you can:
Custom Knowledge Bases operate through a webhook-style integration where Vapi forwards search requests to your server and expects structured responses containing relevant documents.
User asks assistant a question during conversation
Vapi sends search request to your custom endpoint
Your server returns relevant documents or direct response
Use the Vapi API to create a custom knowledge base configuration:
Custom knowledge bases can only be attached to assistants via the API. This functionality is not available through the dashboard interface.
To attach a custom knowledge base to your assistant, update the assistant’s model configuration. You must provide the complete model configuration including all existing messages, as partial patches are not supported for nested objects:
When updating an assistant’s model, you must include the complete model object including all existing messages and configuration. The API replaces the entire model object and doesn’t support partial updates for nested objects.
Your custom knowledge base server must handle POST requests at the configured URL and return structured responses.
Vapi will send requests to your endpoint with the following structure:
Your endpoint can respond in two ways:
Return an array of relevant documents that the AI will use to formulate a response:
Return a complete response that the assistant will speak directly:
Here are complete server implementations in different languages:
For production use, integrate with a proper vector database:
Response time is critical: Your endpoint should respond in milliseconds (ideally under ~50ms) for optimal user experience. While Vapi allows up to 10 seconds timeout, slower responses will significantly affect your assistant’s conversational flow and response quality.
Cache frequently requested documents and implement request timeouts to ensure fast response times. Consider using in-memory caches, CDNs, or pre-computed embeddings for faster retrieval.
Always handle errors gracefully and return appropriate HTTP status codes:
Now that you have a custom knowledge base implementation:
Custom Knowledge Bases require a webhook endpoint that’s publicly accessible. For production deployments, ensure your server can handle concurrent requests and has appropriate error handling and monitoring in place.