Call queue management with Twilio
Handle high-volume calls with Twilio queues when hitting Vapi concurrency limits
Handle high-volume calls with Twilio queues when hitting Vapi concurrency limits
When your application receives more simultaneous calls than your Vapi concurrency limit allows, calls can be rejected. A call queue system using Twilio queues solves this by holding excess calls in a queue and processing them as capacity becomes available.
In this guide, you’ll learn to:
This approach is ideal for call centers, customer support lines, or any application expecting call volumes that exceed your Vapi concurrency limit.
Before implementing call queue management, ensure you have:
You’ll need to know your Vapi account’s concurrency limit. Check your plan details in the Vapi Dashboard under billing settings.
For production deployments, especially in serverless environments, Redis ensures your call counters persist across server restarts and function invocations.
The queue management system operates in three phases:
Incoming calls are automatically placed in a Twilio queue when received
Server monitors active Vapi calls against your concurrency limit
When capacity is available, calls are dequeued and connected to Vapi
Call Flow:
First, create a Twilio queue using the Twilio CLI to hold incoming calls.
Expected Response:
Save the queue sid (e.g., QUaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa) - you’ll need this for queue operations.
Configure your Twilio phone number to send incoming calls to your queue endpoint.
https://your-server.com/incomingPOSTConfigure Redis for persistent call counter storage. Choose the option that best fits your deployment:
Install Redis locally:
Test connection:
Create your Node.js server with the required dependencies and environment variables.
Install Dependencies:
Environment Variables (.env):
Create the main server file with queue handling, concurrency tracking, and Vapi integration.
Configure your Vapi assistant to send end-of-call-report webhooks for accurate concurrency tracking.
Assistant Configuration: You need to configure your assistant with proper webhook settings to receive call status updates.
The webhook will be sent to your server URL with the message type end-of-call-report when calls end. This allows you to decrement your active call counter accurately. See the Assistant API reference for all available server message types.
Webhook Payload Example:
Your /call-ended endpoint will receive a webhook with this structure:
Deploy your server and test the complete queue management flow.
Start Your Server:
Test Scenarios:
MAX_CONCURRENCY settingMonitor Queue Status:
The system uses event-driven queue processing that responds immediately to capacity changes, eliminating the need for timers and preventing memory leaks:
setImmediate() to process queue as soon as capacity becomes availableQueue processing happens immediately when calls end or arrive
Redis persistence works across serverless function invocations
No timers means no memory leaks from long-running processes
Counters survive server restarts and deployments
Queue processing is automatically triggered when:
setImmediate(() => processQueue()) after adding to queuesetImmediate(() => processQueue()) after decrementing active countRedis is required for this implementation. Ensure your Redis instance is properly configured and accessible from your deployment environment.
Common causes:
REDIS_URL configurationSolutions:
redis-cli ping (should return PONG)REDIS_URL format matches your providerHealth check endpoint shows Redis status:
Common causes:
Solutions:
redis-cli get vapi:queue:active_callsredis-cli set vapi:queue:active_calls 0Debug Redis state:
Check these items:
MAX_CONCURRENCY setting is appropriate for your Vapi planDebug steps:
/process-queue endpoint manually/health endpoint for current capacity and Redis statusServerless-specific considerations:
Solutions:
Potential issues:
Solutions:
Production considerations:
Monitoring recommendations:
Now that you have a production-ready call queue system with Redis persistence and callback-driven processing:
Consider implementing health checks, metrics collection, and alerting around your Redis counters and queue processing latency for production monitoring.