Custom Transcriber
Introduction
Vapi supports several transcription providers, but sometimes you may need to use your own transcription service. This guide shows you how to integrate Deepgram as your custom transcriber. The solution streams raw stereo PCM audio (16‑bit) from Vapi via WebSocket to your server, which then forwards the audio to Deepgram. Deepgram returns real‑time partial and final transcripts that are processed (including channel detection) and sent back to Vapi.
Why Use a Custom Transcriber?
- Flexibility: Integrate with your preferred transcription service.
- Control: Implement specialized processing that isn’t available with built‑in providers.
- Cost Efficiency: Leverage your existing transcription infrastructure while maintaining full control over the pipeline.
- Customization: Tailor the handling of audio data, transcript formatting, and buffering according to your specific needs.
How It Works
-
Connection Initialization:
Vapi connects to your custom transcriber endpoint (e.g./api/custom-transcriber
) via WebSocket. It sends an initial JSON message like this: -
Audio Streaming:
Vapi then streams binary PCM audio to your server. -
Transcription Processing:
Your server forwards the audio to Deepgram(Chooseen Transcriber for Example) using its SDK. Deepgram processes the audio and returns transcript events that include achannel_index
(e.g.[0, ...]
for customer,[1, ...]
for assistant). The service buffers the incoming data, processes the transcript events (with debouncing and channel detection), and emits a final transcript. -
Response:
The final transcript is sent back to Vapi as a JSON message:
Implementation Steps
1. Project Setup
Create a new Node.js project and install the required dependencies:
Create a .env
file with the following content:
2. Code Files
Below are the individual code files you need for the integration.
transcriptionService.js
This service creates a live connection to Deepgram, processes incoming audio, handles transcript events (including channel detection), and emits the final transcript back to the caller.
server.js
This file creates an Express server, attaches the custom transcriber WebSocket at /api/custom-transcriber
, and starts the HTTP server.
Testing Your Integration
Code Examples – How to Test
-
Deploy Your Server:
Run your server with: -
Expose Your Server:
If you want to test externally, use a tool like ngrok to expose your server via HTTPS/WSS. -
Initiate a Call with Vapi:
Use the following CURL command (update the placeholders with your actual values):
Expected Behavior
- Vapi connects via WebSocket to your custom transcriber at
/api/custom-transcriber
. - The
"start"
message initializes the Deepgram session. - PCM audio data is forwarded to Deepgram.
- Deepgram returns transcript events, which are processed with channel detection and debouncing.
- The final transcript is sent back as a JSON message:
Notes and Limitations
-
Streaming Support Requirement:
The custom transcriber must support streaming. Vapi sends continuous audio data over the WebSocket, and your server must handle this stream in real time. -
Secret Header:
The custom transcriber configuration accepts an optional field calledsecret
. When set, Vapi will send this value with every request as an HTTP header namedx-vapi-secret
. This can also be configured via a headers field. -
Buffering:
The solution buffers PCM audio and performs simple validation (e.g. ensuring stereo PCM data length is a multiple of 4). If the audio data is malformed, it is trimmed to a valid length. -
Channel Detection:
Transcript events from Deepgram include achannel_index
array. The service uses the first element to determine whether the transcript is from the customer (0
) or the assistant (1
). Ensure Deepgram’s response format remains consistent with this logic.
Conclusion
Using a custom transcriber with Vapi gives you the flexibility to integrate any transcription service into your call flows. This guide walked you through the setup, usage, and testing of a solution that streams real-time audio, processes transcripts with multi‑channel detection, and returns formatted responses back to Vapi. Follow the steps above and use the provided code examples to build your custom transcriber solution.