A custom transcriber lets you use your own transcription service with Vapi, instead of a built-in provider. This is useful if you need more control, want to use a specific provider like Deepgram, or have custom processing needs.
This guide shows you how to set up Deepgram as your custom transcriber. The same approach can be adapted for other providers.
You’ll learn how to:
Vapi connects to your custom transcriber endpoint (e.g. /api/custom-transcriber) via WebSocket. It sends an initial JSON message like this:
Your server forwards the audio to Deepgram (or your chosen transcriber) using its SDK. Deepgram processes the audio and returns transcript events that include a channel_index (e.g. [0, ...] for customer, [1, ...] for assistant). The service buffers the incoming data, processes the transcript events (with debouncing and channel detection), and emits a final transcript.
The transcript is sent back to Vapi as a JSON message:
The optional transcriptType field controls how Vapi handles the transcript:
"final" (default) — the transcription is definitive."partial" — the transcription is provisional and may be superseded by a later message. Each partial replaces the previous one until a "final" arrives.If omitted, transcriptType defaults to "final" for backward compatibility.
Create a new Node.js project and install the required dependencies:
Create a .env file with the following content:
Expected behavior:
/api/custom-transcriber."start" message initializes the Deepgram session.credentialId. Create Custom Credentials in the dashboard to manage Bearer Token, OAuth 2.0, or HMAC authentication. For backward compatibility, the legacy secret field is still supported and sends the value as an x-vapi-secret HTTP header.channel_index array. The service uses the first element to determine whether the transcript is from the customer (0) or the assistant (1). Ensure Deepgram’s response format remains consistent with this logic.transcriptType to "partial" to send progressive transcription updates. Each partial supersedes the previous one until a "final" message arrives. This is useful for STT providers that emit fast, low-latency partials that get refined over time (e.g. ElevenLabs Scribe). If transcriptType is omitted, Vapi treats the message as "final".Using a custom transcriber with Vapi gives you the flexibility to integrate any transcription service into your call flows. This guide walked you through the setup, usage, and testing of a solution that streams real-time audio, processes transcripts with multi‑channel detection, and returns formatted responses back to Vapi. Follow the steps above and use the provided code examples to build your custom transcriber solution.