Web calls

Build voice interfaces and backend integrations using Vapi's Web and Server SDKs

Overview

Build powerful voice applications that work across web browsers, mobile apps, and backend systems. This guide covers both client-side voice interfaces and server-side call management using Vapi’s comprehensive SDK ecosystem.

In this quickstart, you’ll learn to:

  • Create real-time voice interfaces for web and mobile
  • Build automated outbound and inbound call systems
  • Handle events and webhooks for call management
  • Implement voice widgets and backend integrations

Developing locally? The Vapi CLI makes it easy to initialize projects and test webhooks:

$# Initialize Vapi in your project
$vapi init
$
$# Forward webhooks to local server
$vapi listen --forward-to localhost:3000/webhook

Learn more about the Vapi CLI →

Choose your integration approach

Client-Side Voice Interfaces

Best for: User-facing applications, voice widgets, mobile apps

  • Browser-based voice assistants and widgets
  • Real-time voice conversations
  • Mobile voice applications (iOS, Android, React Native, Flutter)
  • Direct user interaction with assistants
Server-Side Call Management

Best for: Backend automation, bulk operations, system integrations

  • Automated outbound call campaigns
  • Inbound call routing and management
  • CRM integrations and bulk operations
  • Webhook processing and real-time events

Web voice interfaces

Build browser-based voice assistants and widgets for real-time user interaction.

Installation and setup

Build browser-based voice interfaces:

$npm install @vapi-ai/web
1import Vapi from '@vapi-ai/web';
2
3const vapi = new Vapi('YOUR_PUBLIC_API_KEY');
4
5// Start voice conversation
6vapi.start('YOUR_ASSISTANT_ID');
7
8// Listen for events
9vapi.on('call-start', () => console.log('Call started'));
10vapi.on('call-end', () => console.log('Call ended'));
11vapi.on('message', (message) => {
12 if (message.type === 'transcript') {
13 console.log(`${message.role}: ${message.transcript}`);
14 }
15});

Live captions and word-level timing

For UIs that need to render live captions or karaoke-style word highlighting as the assistant speaks, subscribe to the opt-in assistant.speechStarted message. Add it to your assistant’s clientMessages:

1{
2 "clientMessages": ["assistant.speechStarted", "transcript", "speech-update"]
3}

Each event carries the full assistant turn text, the turn number, the source ("model", "force-say", or "custom-voice"), and optional timing data whose shape depends on your voice provider:

1vapi.on('message', (message) => {
2 if (message.type !== 'assistant.speechStarted') return;
3
4 const { text, turn, source, timing } = message;
5
6 if (timing?.type === 'word-alignment') {
7 // ElevenLabs: per-word timestamps at playback cadence (~50-200ms apart).
8 // timing.words includes spaces; join them into a char cursor and
9 // highlight `text` up to that position.
10 } else if (timing?.type === 'word-progress') {
11 // Minimax with voice.subtitleType: "word". Cursor-based:
12 // wordsSpoken / totalWords. See note below — events arrive in
13 // segment-sized jumps, not word-by-word ticks.
14 } else {
15 // Cartesia, Deepgram, Azure, OpenAI, etc.: text-only event tied
16 // to audio playback. Display `text` as a caption block.
17 }
18});

Cadence and granularity vary significantly by voice provider — pick the one that matches your UI requirements:

  • ElevenLabs (word-alignment) is the only provider that emits at true playback cadence with real per-word timestamps. Best for smooth karaoke-style highlighting with no client-side interpolation.
  • Minimax (word-progress) with subtitleType: "word" emits once per synthesis segment, near the end of that segment’s playback. The per-word timing.words[] array carries timestamps for the segment that just finished — useful for retroactive animation or forward extrapolation, but not for driving real-time highlighting during that segment. See the Minimax provider page for details.
  • All other providers emit text-only events (no timing). One event per TTS chunk; you can interpolate a word cursor at a flat rate (~3.5 words/sec) between events for an approximate cursor.

force-say events (your firstMessage, say actions) always emit as text-only, even on ElevenLabs and Minimax. On user barge-in, no further events fire for the interrupted turn — pair with the user-interrupted message to know what was actually spoken.

For the full event schema and field reference, see Server events → Assistant Speech Started.

Voice widget implementation

Create a voice widget for your website:

The fastest way to get started. Copy this snippet into your website:

1<script>
2 var vapiInstance = null;
3 const assistant = "assistant_id"; // Substitute with your assistant ID
4 const apiKey = "your_public_api_key"; // Substitute with your Public key from Vapi Dashboard.
5 const buttonConfig = {}; // Modify this as required
6
7 (function (d, t) {
8 var g = document.createElement(t),
9 s = d.getElementsByTagName(t)[0];
10 g.src =
11 "https://cdn.jsdelivr.net/gh/VapiAI/html-script-tag@latest/dist/assets/index.js";
12 g.defer = true;
13 g.async = true;
14 s.parentNode.insertBefore(g, s);
15
16 g.onload = function () {
17 vapiInstance = window.vapiSDK.run({
18 apiKey: apiKey, // mandatory
19 assistant: assistant, // mandatory
20 config: buttonConfig, // optional
21 });
22 };
23 })(document, "script");
24</script>

Server-side call management

Automate outbound calls and handle inbound call processing with server-side SDKs.

Installation and setup

Install the TypeScript Server SDK:

$npm install @vapi-ai/server-sdk
1import { VapiClient } from "@vapi-ai/server-sdk";
2
3const vapi = new VapiClient({
4 token: process.env.VAPI_API_KEY!
5});
6
7// Create an outbound call
8const call = await vapi.calls.create({
9 phoneNumberId: "YOUR_PHONE_NUMBER_ID",
10 customer: { number: "+1234567890" },
11 assistantId: "YOUR_ASSISTANT_ID"
12});
13
14console.log(`Call created: ${call.id}`);

Creating assistants

1const assistant = await vapi.assistants.create({
2 name: "Sales Assistant",
3 firstMessage: "Hi! I'm calling about your interest in our software solutions.",
4 model: {
5 provider: "openai",
6 model: "gpt-4o",
7 temperature: 0.7,
8 messages: [{
9 role: "system",
10 content: "You are a friendly sales representative. Keep responses under 30 words."
11 }]
12 },
13 voice: {
14 provider: "11labs",
15 voiceId: "21m00Tcm4TlvDq8ikWAM"
16 }
17});

Bulk operations

Run automated call campaigns for sales, surveys, or notifications:

1async function runBulkCallCampaign(assistantId: string, phoneNumberId: string) {
2 const prospects = [
3 { number: "+1234567890", name: "John Smith" },
4 { number: "+1234567891", name: "Jane Doe" },
5 // ... more prospects
6 ];
7
8 const calls = [];
9 for (const prospect of prospects) {
10 const call = await vapi.calls.create({
11 assistantId,
12 phoneNumberId,
13 customer: prospect,
14 metadata: { campaign: "Q1_Sales" }
15 });
16 calls.push(call);
17
18 // Rate limiting
19 await new Promise(resolve => setTimeout(resolve, 2000));
20 }
21
22 return calls;
23}

Webhook integration

Handle real-time events for both client and server applications:

1import express from 'express';
2
3const app = express();
4app.use(express.json());
5
6app.post('/webhook/vapi', async (req, res) => {
7 const { message } = req.body;
8
9 switch (message.type) {
10 case 'status-update':
11 console.log(`Call ${message.call.id}: ${message.call.status}`);
12 break;
13 case 'transcript':
14 console.log(`${message.role}: ${message.transcript}`);
15 break;
16 case 'function-call':
17 return handleFunctionCall(message, res);
18 }
19
20 res.status(200).json({ received: true });
21});
22
23function handleFunctionCall(message: any, res: express.Response) {
24 const { functionCall } = message;
25
26 switch (functionCall.name) {
27 case 'lookup_order':
28 const orderData = { orderId: functionCall.parameters.orderId, status: 'shipped' };
29 return res.json({ result: orderData });
30 default:
31 return res.status(400).json({ error: 'Unknown function' });
32 }
33}
34
35app.listen(3000, () => console.log('Webhook server running on port 3000'));

Next steps

Now that you understand both client and server SDK capabilities:

  • Explore use cases: Check out our examples section for complete implementations
  • Add tools: Connect your voice agents to external APIs and databases with custom tools
  • Configure models: Try different speech and language models for better performance
  • Scale with squads: Use Squads for multi-assistant setups and complex processes

Resources

Client SDKs:

Server SDKs:

Documentation: