Custom Knowledge Base

Overview

Custom Knowledge Bases allow you to implement your own document retrieval server, giving you complete control over how your assistant searches and retrieves information. Instead of relying on Vapi’s built-in knowledge base providers, you can integrate your own search infrastructure, vector databases, or custom retrieval logic.

With Custom Knowledge Bases, you can:

Use your own vector database or search infrastructure
Implement custom retrieval algorithms and scoring
Integrate with existing document management systems
Apply custom business logic to document filtering
Maintain full control over data security and privacy

How Custom Knowledge Bases Work

Custom Knowledge Bases operate through a webhook-style integration where Vapi forwards search requests to your server and expects structured responses containing relevant documents.

User Query

User asks assistant a question during conversation

Search Request

Vapi sends search request to your custom endpoint

Document Response

Your server returns relevant documents or direct response

Creating a Custom Knowledge Base

Step 1: Create the Knowledge Base

Use the Vapi API to create a custom knowledge base configuration:

$ curl --location 'https://api.vapi.ai/knowledge-base' \
> --header 'Content-Type: application/json' \
> --header 'Authorization: Bearer YOUR_VAPI_API_KEY' \
> --data '{
>     "provider": "custom-knowledge-base",
>     "server": {
>         "url": "https://your-domain.com/kb/search",
>         "secret": "your-webhook-secret"
>     }
> }'

Step 2: Attach to Your Assistant

Custom knowledge bases can only be attached to assistants via the API. This functionality is not available through the dashboard interface.

To attach a custom knowledge base to your assistant, update the assistant’s model configuration. You must provide the complete model configuration including all existing messages, as partial patches are not supported for nested objects:

$ curl --location --request PATCH 'https://api.vapi.ai/assistant/YOUR_ASSISTANT_ID' \
> --header 'Content-Type: application/json' \
> --header 'Authorization: Bearer YOUR_VAPI_API_KEY' \
> --data '{
>     "model": {
>         "model": "gpt-4o",
>         "provider": "openai",
>         "messages": [
>             {
>                 "role": "system",
>                 "content": "Your existing system prompt and instructions..."
>             }
>         ],
>         "knowledgeBaseId": "YOUR_KNOWLEDGE_BASE_ID"
>     }
> }'

When updating an assistant’s model, you must include the complete model object including all existing messages and configuration. The API replaces the entire model object and doesn’t support partial updates for nested objects.

Implementing the Custom Endpoint

Your custom knowledge base server must handle POST requests at the configured URL and return structured responses.

Request Structure

Vapi will send requests to your endpoint with the following structure:

Request Format

1 {
2   "message": {
3     "type": "knowledge-base-request",
4     "messages": [
5       {
6         "role": "user",
7         "content": "What is your return policy?"
8       },
9       {
10         "role": "assistant", 
11         "content": "I'll help you with information about our return policy."
12       },
13       {
14         "role": "user",
15         "content": "How long do I have to return items?"
16       }
17     ]
18     // Additional metadata fields about the call or chat will be included here
19   }
20 }

Response Options

Your endpoint can respond in two ways:

Option 1: Return Documents for AI Processing

Return an array of relevant documents that the AI will use to formulate a response:

Document Response

1 {
2   "documents": [
3     {
4       "content": "Our return policy allows customers to return items within 30 days of purchase for a full refund. Items must be in original condition with tags attached.",
5       "similarity": 0.92,
6       "uuid": "doc-return-policy-1" // optional
7     },
8     {
9       "content": "Extended return periods apply during holiday seasons - customers have up to 60 days to return items purchased between November 1st and December 31st.",
10       "similarity": 0.78,
11       "uuid": "doc-return-policy-holiday" // optional
12     }
13   ]
14 }

Option 2: Return Direct Response

Return a complete response that the assistant will speak directly:

Direct Response

1 {
2   "message": {
3     "role": "assistant",
4     "content": "You have 30 days to return items for a full refund. Items must be in original condition with tags attached. During the holiday season (November 1st to December 31st), you get an extended 60-day return period."
5   }
6 }

Implementation Examples

Here are complete server implementations in different languages:

1 import express from 'express';
2 import crypto from 'crypto';
3 
4 const app = express();
5 app.use(express.json());
6 
7 // Your knowledge base data (replace with actual database/vector store)
8 const documents = [
9   {
10     id: "return-policy-1",
11     content: "Our return policy allows customers to return items within 30 days of purchase for a full refund. Items must be in original condition with tags attached.",
12     category: "returns"
13   },
14   {
15     id: "shipping-info-1", 
16     content: "We offer free shipping on orders over $50. Standard shipping takes 3-5 business days.",
17     category: "shipping"
18   }
19 ];
20 
21 app.post('/kb/search', (req, res) => {
22   try {
23     // Verify webhook secret (recommended)
24     const signature = req.headers['x-vapi-signature'];
25     const secret = process.env.VAPI_WEBHOOK_SECRET;
26     
27     if (signature && secret) {
28       const expectedSignature = crypto
29         .createHmac('sha256', secret)
30         .update(JSON.stringify(req.body))
31         .digest('hex');
32         
33       if (signature !== `sha256=${expectedSignature}`) {
34         return res.status(401).json({ error: 'Invalid signature' });
35       }
36     }
37 
38     const { message } = req.body;
39     
40     if (message.type !== 'knowledge-base-request') {
41       return res.status(400).json({ error: 'Invalid request type' });
42     }
43 
44     // Get the latest user message
45     const userMessages = message.messages.filter(msg => msg.role === 'user');
46     const latestQuery = userMessages[userMessages.length - 1]?.content || '';
47 
48     // Simple keyword-based search (replace with vector search)
49     const relevantDocs = documents
50       .map(doc => ({
51         ...doc,
52         similarity: calculateSimilarity(latestQuery, doc.content)
53       }))
54       .filter(doc => doc.similarity > 0.1)
55       .sort((a, b) => b.similarity - a.similarity)
56       .slice(0, 3);
57 
58     // Return documents for AI processing
59     res.json({
60       documents: relevantDocs.map(doc => ({
61         content: doc.content,
62         similarity: doc.similarity,
63         uuid: doc.id
64       }))
65     });
66 
67   } catch (error) {
68     console.error('Knowledge base search error:', error);
69     res.status(500).json({ error: 'Internal server error' });
70   }
71 });
72 
73 function calculateSimilarity(query: string, content: string): number {
74   // Simple similarity calculation (replace with proper vector similarity)
75   const queryWords = query.toLowerCase().split(' ');
76   const contentWords = content.toLowerCase().split(' ');
77   const matches = queryWords.filter(word => 
78     contentWords.some(cWord => cWord.includes(word))
79   ).length;
80   
81   return matches / queryWords.length;
82 }
83 
84 app.listen(3000, () => {
85   console.log('Custom Knowledge Base server running on port 3000');
86 });

Advanced Implementation Patterns

Vector Database Integration

For production use, integrate with a proper vector database:

1 import { PineconeClient } from '@pinecone-database/pinecone';
2 import OpenAI from 'openai';
3 
4 const pinecone = new PineconeClient();
5 const openai = new OpenAI();
6 
7 app.post('/kb/search', async (req, res) => {
8   try {
9     const { message } = req.body;
10     const latestQuery = getLatestUserMessage(message);
11 
12     // Generate embedding for the query
13     const embedding = await openai.embeddings.create({
14       model: 'text-embedding-ada-002',
15       input: latestQuery
16     });
17 
18     // Search vector database
19     const index = pinecone.Index('knowledge-base');
20     const searchResults = await index.query({
21       vector: embedding.data[0].embedding,
22       topK: 5,
23       includeMetadata: true
24     });
25 
26     // Format response
27     const documents = searchResults.matches.map(match => ({
28       content: match.metadata.content,
29       similarity: match.score,
30       uuid: match.id
31     }));
32 
33     res.json({ documents });
34   } catch (error) {
35     console.error('Vector search error:', error);
36     res.status(500).json({ error: 'Search failed' });
37   }
38 });

Security and Best Practices

Performance Optimization

Response time is critical: Your endpoint should respond in milliseconds (ideally under ~50ms) for optimal user experience. While Vapi allows up to 10 seconds timeout, slower responses will significantly affect your assistant’s conversational flow and response quality.

Cache frequently requested documents and implement request timeouts to ensure fast response times. Consider using in-memory caches, CDNs, or pre-computed embeddings for faster retrieval.

Error Handling

Always handle errors gracefully and return appropriate HTTP status codes:

1 app.post('/kb/search', async (req, res) => {
2   try {
3     // Your search logic here
4   } catch (error) {
5     console.error('Search error:', error);
6     
7     // Return empty documents rather than failing
8     res.json({ 
9       documents: [],
10       error: "Search temporarily unavailable"
11     });
12   }
13 });

Next Steps

Now that you have a custom knowledge base implementation:

Query Tool Configuration: Learn advanced query tool configurations
Vector Databases: Explore vector database integrations
Assistant Configuration: Optimize your assistant’s use of knowledge bases

Custom Knowledge Bases require a webhook endpoint that’s publicly accessible. For production deployments, ensure your server can handle concurrent requests and has appropriate error handling and monitoring in place.