OpenAI Realtime
Build voice assistants with OpenAI’s native speech-to-speech models for ultra-low latency conversations
Overview
OpenAI’s Realtime API enables developers to use a native speech-to-speech model. Unlike other Vapi configurations which orchestrate a transcriber, model and voice API to simulate speech-to-speech, OpenAI’s Realtime API natively processes audio in and audio out.
In this guide, you’ll learn to:
- Choose the right realtime model for your use case
- Configure voice assistants with realtime capabilities
- Implement best practices for production deployments
- Optimize prompts specifically for realtime models
Available models
The gpt-realtime-2025-08-28
model is production-ready.
OpenAI offers three realtime models, each with different capabilities and cost/performance trade-offs:
Voice options
Realtime models support a specific set of OpenAI voices optimized for speech-to-speech:
Available across all realtime models:
alloy
- Neutral and balancedecho
- Warm and engagingshimmer
- Energetic and expressive
Only available with realtime models:
marin
- Professional and clearcedar
- Natural and conversational
The following voices are NOT supported by realtime models: ash, ballad, coral, fable, onyx, and nova.
Configuration
Basic setup
Configure a realtime assistant with function calling:
Using realtime-exclusive voices
To use the enhanced voices only available with realtime models:
Handling instructions
Unlike traditional OpenAI models, realtime models receive instructions through the session configuration. Vapi automatically converts your system messages to session instructions during WebSocket initialization.
The system message in your model configuration is automatically optimized for realtime processing:
- System messages are converted to session instructions
- Instructions are sent during WebSocket session initialization
- The instructions field supports the same prompting strategies as system messages
Prompting best practices
Realtime models benefit from different prompting techniques than text-based models. These guidelines are based on OpenAI’s official prompting guide.
General tips
- Iterate relentlessly: Small wording changes can significantly impact behavior
- Use bullet points over paragraphs: Clear, short bullets outperform long text blocks
- Guide with examples: The model closely follows sample phrases you provide
- Be precise: Ambiguity or conflicting instructions degrade performance
- Control language: Pin output to a target language to prevent unwanted switching
- Reduce repetition: Add variety rules to avoid robotic phrasing
- Capitalize for emphasis: Use CAPS for key rules to make them stand out
Prompt structure
Organize your prompts with clear sections for better model comprehension:
Realtime-specific techniques
Speaking Speed
Personality
Conversation Flow
Control the model’s speaking pace with explicit instructions:
Migration guide
Transitioning from standard STT/TTS to realtime models:
Verify voice compatibility
Ensure your selected voice is supported (alloy, echo, shimmer, marin, or cedar)
Best practices
Model selection strategy
When to use gpt-realtime-2025-08-28
Best for production workloads requiring:
- Structured outputs for form filling or data collection
- Complex function orchestration
- Highest quality voice interactions
- Responses API integration
When to use gpt-4o-realtime-preview
Best for development and testing:
- Prototyping voice applications
- Balanced cost/performance during development
- Testing conversation flows before production
When to use gpt-4o-mini-realtime-preview
Best for cost-sensitive applications:
- High-volume voice interactions
- Simple Q&A or routing scenarios
- Applications where latency is critical
Performance optimization
- Temperature settings: Use 0.5-0.7 for consistent yet natural responses
- Max tokens: Set appropriate limits (200-300) for conversational responses
- Voice selection: Test different voices to match your brand personality
- Function design: Keep function schemas simple for faster execution
Error handling
Handle edge cases gracefully:
Current limitations
Be aware of these limitations when implementing realtime models:
- Knowledge Bases are not currently supported with the Realtime API
- Endpointing and Interruption models are managed by Vapi’s orchestration layer
- Custom voice cloning is not available for realtime models
- Some OpenAI voices (ash, ballad, coral, fable, onyx, nova) are incompatible
- Transcripts may have slight differences from traditional STT output
Additional resources
Next steps
Now that you understand OpenAI Realtime models:
- Phone Calling Guide: Set up inbound and outbound calling
- Assistant Hooks: Add custom logic to your conversations
- Voice Providers: Explore other voice options