Custom LLM Tool Calling Integration

What Is a Custom LLM and Why Use It?

A Custom LLM is more than just a text generator—it’s a conversational assistant that can call external functions, trigger processes, and handle special logic, all while chatting with your users. Think of it as your smart helper that not only answers questions but also takes actions.

Why use a Custom LLM?

  • Enhanced Functionality: It mixes natural language responses with actionable functions.
  • Flexibility: You can combine built-in functions, attach external tools via Vapi, or even add custom endpoints.
  • Dynamic Interactions: The assistant can return structured instructions—like transferring a call or running a custom process—when needed.
  • Seamless Integration: Vapi lets you plug these custom endpoints into your assistant quickly and easily.

Setting Up Your Custom LLM for Response Generation

Before adding tool calls, let’s start with the basics: setting up your Custom LLM to simply generate conversation responses. In this mode, your LLM receives conversation details, asks the model for a reply, and streams that text back.

How It Works

  • Request Reception: Your endpoint (e.g., /chat/completions) gets a payload with the model, messages, temperature, and (optionally) tools.
  • Content Generation: The code builds an OpenAI API request that includes the conversation context.
  • Response Streaming: The generated reply is sent back as Server-Sent Events (SSE).

Sample Code Snippet

1app.post("/chat/completions", async (req: Request, res: Response) => {
2 // Log the incoming request.
3 logEvent("Request received at /chat/completions", req.body);
4 const payload = req.body;
5
6 // Prepare the API request to OpenAI.
7 const requestArgs: any = {
8 model: payload.model,
9 messages: payload.messages,
10 temperature: payload.temperature ?? 1.0,
11 stream: true,
12 tools: payload.tools || [],
13 tool_choice: "auto",
14 };
15
16 // Optionally merge in native tool definitions.
17 const modelTools = payload.tools || [];
18 requestArgs.tools = [...modelTools, ...ourTools];
19
20 logEvent("Calling OpenAI API for content generation");
21 const openAIResponse = await openai.chat.completions.create(requestArgs);
22 logEvent("OpenAI API call successful. Streaming response.");
23
24 // Set up streaming headers.
25 res.setHeader("Content-Type", "text/event-stream");
26 res.setHeader("Cache-Control", "no-cache");
27 res.setHeader("Connection", "keep-alive");
28
29 // Stream the response chunks back.
30 for await (const chunk of openAIResponse as unknown as AsyncIterable<any>) {
31 res.write(`data: ${JSON.stringify(chunk)}\n\n`);
32 }
33 res.write("data: [DONE]\n\n");
34 res.end();
35});

Attaching Custom LLM Without Tools to an Existing Assistant in Vapi

If you just want response generation (without tool calls), update your Vapi model with a PATCH request like this:

$curl -X PATCH https://api.vapi.ai/assistant/insert-your-assistant-id-here \
> -H "Authorization: Bearer insert-your-private-key-here" \
> -H "Content-Type: application/json" \
> -d '{
> "model": {
> "provider": "custom-llm",
> "model": "gpt-4o",
> "url": "https://custom-llm-url/chat/completions",
> "messages": [
> {
> "role": "system",
> "content": "[TASK] Ask the user if they want to transfer the call; if not, continue the conversation."
> }
> ]
> },
> "transcriber": {
> "provider": "azure",
> "language": "en-CA"
> }
>}'

Adding Tools Calling with Your Custom LLM

Now that you’ve got response generation working, let’s expand your assistant’s abilities. Your Custom LLM can trigger external actions in three different ways.

a. Native LLM Tools

These tools are built right into your LLM integration. For example, a native function like get_payment_link can return a payment URL.

How It Works:

  1. Detection: The LLM’s streaming response includes a tool call for get_payment_link.
  2. Execution: The integration parses the arguments and calls the native function.
  3. Response: The result is packaged into a follow-up API call and streamed back.

Code Snippet:

1// Variables to accumulate tool call information.
2let argumentsStr = "";
3let toolCallInfo: { name?: string; id?: string } | null = null;
4
5// Process streaming chunks.
6for await (const chunk of openAIResponse as unknown as AsyncIterable<any>) {
7 const choice = chunk.choices && chunk.choices[0];
8 const delta = choice?.delta || {};
9 const toolCalls = delta.tool_calls;
10
11 if (toolCalls && toolCalls.length > 0) {
12 for (const toolCall of toolCalls) {
13 const func = toolCall.function;
14 if (func && func.name) {
15 toolCallInfo = { name: func.name, id: toolCall.id };
16 }
17 if (func && func.arguments) {
18 argumentsStr += func.arguments;
19 }
20 }
21 }
22
23 const finishReason = choice?.finish_reason;
24 if (finishReason === "tool_calls" && toolCallInfo) {
25 let parsedArgs = {};
26 try {
27 parsedArgs = JSON.parse(argumentsStr);
28 } catch (err) {
29 console.error("Failed to parse arguments:", err);
30 }
31 if (tool_functions[toolCallInfo.name!]) {
32 const result = await tool_functions[toolCallInfo.name!](parsedArgs);
33 const functionMessage = {
34 role: "function",
35 name: toolCallInfo.name,
36 content: JSON.stringify(result)
37 };
38
39 const followUpResponse = await openai.chat.completions.create({
40 model: requestArgs.model,
41 messages: [...requestArgs.messages, functionMessage],
42 temperature: requestArgs.temperature,
43 stream: true,
44 tools: requestArgs.tools,
45 tool_choice: "auto"
46 });
47
48 for await (const followUpChunk of followUpResponse) {
49 res.write(`data: ${JSON.stringify(followUpChunk)}\n\n`);
50 }
51 argumentsStr = "";
52 toolCallInfo = null;
53 continue;
54 }
55 }
56 res.write(`data: ${JSON.stringify(chunk)}\n\n`);
57}

b. Vapi-Attached Tools

These tools come pre-attached via your Vapi configuration. For example, the transferCall tool:

How It Works:

  1. Detection: When a tool call for transferCall appears with a destination in the payload, the function isn’t executed.
  2. Response: The integration immediately sends a function call payload with the destination back to Vapi.

Code Snippet:

1if (functionName === "transferCall" && payload.destination) {
2 const functionCallPayload = {
3 function_call: {
4 name: "transferCall",
5 arguments: {
6 destination: payload.destination,
7 },
8 },
9 };
10 logEvent("Special handling for transferCall", { functionCallPayload });
11 res.write(`data: ${JSON.stringify(functionCallPayload)}\n\n`);
12 // Skip further processing for this chunk.
13 continue;
14}

c. Custom Tools

Custom tools are unique to your application and are handled by a dedicated endpoint. For example, a custom function named processOrder.

How It Works:

  1. Dedicated Endpoint: Requests for custom tools go to /chat/completions/custom-tool.
  2. Detection: The payload includes a tool call list. If the function name is "processOrder", a hardcoded result is returned.
  3. Response: A JSON response is sent back with the result.

Code Snippet (Custom Endpoint):

1app.post("/chat/completions/custom-tool", async (req: Request, res: Response) => {
2 logEvent("Received request at /chat/completions/custom-tool", req.body);
3 // Expect the payload to have a "message" with a "toolCallList" array.
4 const vapiPayload = req.body.message;
5
6 // Process tool call.
7 for (const toolCall of vapiPayload.toolCallList) {
8 if (toolCall.function?.name === "processOrder") {
9 const hardcodedResult = "CustomTool processOrder With CustomLLM Always Works";
10 logEvent("Returning hardcoded result for 'processOrder'", { toolCallId: toolCall.id });
11 return res.json({
12 results: [
13 {
14 toolCallId: toolCall.id,
15 result: hardcodedResult,
16 },
17 ],
18 });
19 }
20 }
21});

Testing Tool Calling with cURL

Once your endpoints are set up, try testing them with these cURL commands.

$curl -X POST https://custom-llm-url/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "gpt-3.5-turbo",
> "messages": [
> {"role": "user", "content": "I need a payment link."}
> ],
> "temperature": 0.7,
> "tools": [
> {
> "type": "function",
> "function": {
> "name": "get_payment_link",
> "description": "Get a payment link",
> "parameters": {}
> }
> }
> ]
> }'

Expected Response:
Streaming chunks eventually include the result (e.g., a payment link) returned by the native tool function.

b. Vapi-Attached Tool Calling (transferCall)

$curl -X POST https://custom-llm-url/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "gpt-3.5-turbo",
> "messages": [
> {"role": "user", "content": "Please transfer my call."}
> ],
> "temperature": 0.7,
> "tools": [
> {
> "type": "function",
> "function": {
> "name": "transferCall",
> "description": "Transfer call to a specified destination",
> "parameters": {}
> }
> }
> ],
> "destination": "555-1234"
> }'

Expected Response:
Immediately returns a function call payload that instructs Vapi to transfer the call to the specified destination.

c. Custom Tool Calling (processOrder)

$curl -X POST https://custom-llm-url/chat/completions/custom-tool \
> -H "Content-Type: application/json" \
> -d '{
> "message": {
> "toolCallList": [
> {
> "id": "12345",
> "function": {
> "name": "processOrder",
> "arguments": {
> "param": "value"
> }
> }
> }
> ]
> }
> }'

Expected Response:

1{
2 "results": [
3 {
4 "toolCallId": "12345",
5 "result": "CustomTools With CustomLLM Always Works"
6 }
7 ]
8}

Integrating Tools with Vapi

After testing locally, integrate your Custom LLM with Vapi. Choose the configuration that fits your needs.

a. Without Tools (Response Generation Only)

$curl -X PATCH https://api.vapi.ai/assistant/insert-your-assistant-id-here \
> -H "Authorization: Bearer insert-your-private-key-here" \
> -H "Content-Type: application/json" \
> -d '{
> "model": {
> "provider": "custom-llm",
> "model": "gpt-4o",
> "url": "https://custom-llm-url/chat/completions",
> "messages": [
> {
> "role": "system",
> "content": "[TASK] Ask the user if they want to transfer the call; if not, continue chatting."
> }
> ]
> },
> "transcriber": {
> "provider": "azure",
> "language": "en-CA"
> }
>}'

b. With Tools (Including transferCall and processOrder)

$curl -X PATCH https://api.vapi.ai/assistant/insert-your-assistant-id-here \
> -H "Authorization: Bearer insert-your-private-key-here" \
> -H "Content-Type: application/json" \
> -d '{
> "model": {
> "provider": "custom-llm",
> "model": "gpt-4o",
> "url": "https://custom-llm-url/chat/completions",
> "messages": [
> {
> "role": "system",
> "content": "[TASK] Ask the user if they want to transfer the call; if they agree, trigger the transferCall tool; if not, continue the conversation. Also, if the user asks about the custom function processOrder, trigger that tool."
> }
> ],
> "tools": [
> {
> "type": "transferCall",
> "destinations": [
> {
> "type": "number",
> "number": "+xxxxxx",
> "numberE164CheckEnabled": false,
> "message": "Transferring Call To Customer Service Department"
> }
> ]
> },
> {
> "type": "function",
> "async": false,
> "function": {
> "name": "processOrder",
> "description": "it's a custom tool function named processOrder according to vapi.ai custom tools guide"
> },
> "server": {
> "url": "https://custom-llm-url/chat/completions/custom-tool"
> }
> }
> ]
> },
> "transcriber": {
> "provider": "azure",
> "language": "en-CA"
> }
>}'

Conclusion

A Custom LLM turns a basic conversational assistant into an interactive helper that can:

  • Generate everyday language responses,
  • Call native tools (like fetching a payment link),
  • Use Vapi-attached tools (like transferring a call), and
  • Leverage custom tools (like processing orders).

By building each layer step by step and testing with cURL, you can fine-tune your integration before rolling it out in production.


Complete Code

For your convenience, you can find the complete source code for this Custom LLM integration here:

Custom LLM with Vapi Integration – Complete Code

Built with