For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
WebsiteStatusSupportDashboard
DocumentationAPI ReferenceMCPSDKsCLI (new)What's New?
DocumentationAPI ReferenceMCPSDKsCLI (new)What's New?
  • Get started
    • Introduction
    • Phone calls
    • Web calls
    • Vapi Guides
    • Composer
    • CLI quickstart
  • Assistants
    • Quickstart
    • Tools
    • Custom keywords
    • Custom voices
    • Custom transcriber
    • Custom TTS
      • Fine-tuned OpenAI models
      • Bring your own server
      • Tool calling integration
  • Observability
    • Boards
  • Squads
    • Quickstart
    • Overview
    • Handoff tool
    • Passing data between assistants
  • Best practices
    • Prompting guide
    • Debugging voice agents
    • Enterprise environments (DEV/UAT/PROD)
    • IVR navigation
  • Phone numbers
    • Free Vapi number
    • Inbound SMS
    • Phone Number Hooks
  • Calls
    • Call end reasons
    • Troubleshoot call errors
  • Outbound Campaigns
    • Quickstart
    • Overview
  • Chat
    • Quickstart
    • Streaming
    • Non-streaming
    • OpenAI compatibility
    • Session management
    • Variable substitution
    • SMS chat
    • Web widget
    • Webhooks
  • Workflows
    • Quickstart
    • Overview
LogoLogo
WebsiteStatusSupportDashboard
On this page
  • Using an LLM provider
  • Using Fine-Tuned OpenAI Models
  • Using your server
AssistantsCustom LLMs

Fine-tuned OpenAI models

Use Another LLM or Your Own Server
Was this page helpful?
Edit this page
Previous

Connecting Your Custom LLM to Vapi: A Comprehensive Guide

Next
Built with

Vapi supports using any OpenAI-compatible endpoint as the LLM. This includes services like OpenRouter, AnyScale, Together AI, or your own server.

When to Use Custom LLMs
  • For an open-source LLM, like Mixtral
  • To update the context during the conversation
  • To customize the messages before they’re sent to an LLM

Using an LLM provider

You’ll first want to POST your API key via the /credential endpoint:

1{
2 "provider": "openrouter",
3 "apiKey": "<YOUR OPENROUTER KEY>"
4}

Then, you can create an assistant with the model provider:

1{
2 "name": "My Assistant",
3 "model": {
4 "provider": "openrouter",
5 "model": "cognitivecomputations/dolphin-mixtral-8x7b",
6 "messages": [
7 {
8 "role": "system",
9 "content": "You are an assistant."
10 }
11 ],
12 "temperature": 0.7
13 }
14}

Using Fine-Tuned OpenAI Models

To set up your OpenAI Fine-Tuned model, you need to follow these steps:

  1. Set the custom llm URL to https://api.openai.com/v1.
  2. Assign the custom llm key to the OpenAI key.
  3. Update the model to their model.
  4. Execute a PATCH request to the /assistant endpoint and ensure that model.metadataSendMode is set to off.

Using your server

To set up your server to act as the LLM, you’ll need to create an endpoint that is compatible with the OpenAI Client. For best results, your endpoint should also support streaming completions.

If your server is making calls to an OpenAI compatble API, you can pipe the requests directly back in your response to Vapi.

If you’d like your OpenAI-compatible endpoint to be authenticated, you can POST your server’s API key and URL via the /credential endpoint:

1{
2 "provider": "custom-llm",
3 "apiKey": "<YOUR SERVER API KEY>"
4}

If your server isn’t authenticated, you can skip this step.

Then, you can create an assistant with the custom-llm model provider:

1{
2 "name": "My Assistant",
3 "model": {
4 "provider": "custom-llm",
5 "url": "<YOUR OPENAI COMPATIBLE ENDPOINT BASE URL>",
6 "model": "my-cool-model",
7 "messages": [
8 {
9 "role": "system",
10 "content": "You are an assistant."
11 }
12 ],
13 "temperature": 0.7
14 }
15}