OpenAI Realtime

Build voice assistants with OpenAI’s native speech-to-speech models for ultra-low latency conversations

Overview

OpenAI’s Realtime API enables developers to use a native speech-to-speech model. Unlike other Vapi configurations which orchestrate a transcriber, model and voice API to simulate speech-to-speech, OpenAI’s Realtime API natively processes audio in and audio out.

In this guide, you’ll learn to:

  • Choose the right realtime model for your use case
  • Configure voice assistants with realtime capabilities
  • Implement best practices for production deployments
  • Optimize prompts specifically for realtime models

Available models

The gpt-realtime-2025-08-28 model is production-ready.

OpenAI offers three realtime models, each with different capabilities and cost/performance trade-offs:

ModelStatusBest ForKey Features
gpt-realtime-2025-08-28ProductionProduction workloadsProduction Ready
gpt-4o-realtime-preview-2024-12-17PreviewDevelopment & testingBalanced performance/cost
gpt-4o-mini-realtime-preview-2024-12-17PreviewCost-sensitive appsLower latency, reduced cost

Voice options

Realtime models support a specific set of OpenAI voices optimized for speech-to-speech:

Standard Voices

Available across all realtime models:

  • alloy - Neutral and balanced
  • echo - Warm and engaging
  • shimmer - Energetic and expressive
Realtime-Exclusive Voices

Only available with realtime models:

  • marin - Professional and clear
  • cedar - Natural and conversational

The following voices are NOT supported by realtime models: ash, ballad, coral, fable, onyx, and nova.

Configuration

Basic setup

Configure a realtime assistant with function calling:

1{
2 "model": {
3 "provider": "openai",
4 "model": "gpt-realtime-2025-08-28",
5 "messages": [
6 {
7 "role": "system",
8 "content": "You are a helpful assistant. Be concise and friendly."
9 }
10 ],
11 "temperature": 0.7,
12 "maxTokens": 250,
13 "tools": [
14 {
15 "type": "function",
16 "function": {
17 "name": "getWeather",
18 "description": "Get the current weather",
19 "parameters": {
20 "type": "object",
21 "properties": {
22 "location": {
23 "type": "string",
24 "description": "The city name"
25 }
26 },
27 "required": ["location"]
28 }
29 }
30 }
31 ]
32 },
33 "voice": {
34 "provider": "openai",
35 "voiceId": "alloy"
36 }
37}

Using realtime-exclusive voices

To use the enhanced voices only available with realtime models:

1{
2 "voice": {
3 "provider": "openai",
4 "voiceId": "marin" // or "cedar"
5 }
6}

Handling instructions

Unlike traditional OpenAI models, realtime models receive instructions through the session configuration. Vapi automatically converts your system messages to session instructions during WebSocket initialization.

The system message in your model configuration is automatically optimized for realtime processing:

  1. System messages are converted to session instructions
  2. Instructions are sent during WebSocket session initialization
  3. The instructions field supports the same prompting strategies as system messages

Prompting best practices

Realtime models benefit from different prompting techniques than text-based models. These guidelines are based on OpenAI’s official prompting guide.

General tips

  • Iterate relentlessly: Small wording changes can significantly impact behavior
  • Use bullet points over paragraphs: Clear, short bullets outperform long text blocks
  • Guide with examples: The model closely follows sample phrases you provide
  • Be precise: Ambiguity or conflicting instructions degrade performance
  • Control language: Pin output to a target language to prevent unwanted switching
  • Reduce repetition: Add variety rules to avoid robotic phrasing
  • Capitalize for emphasis: Use CAPS for key rules to make them stand out

Prompt structure

Organize your prompts with clear sections for better model comprehension:

# Role & Objective
You are a customer service agent for Acme Corp. Your goal is to resolve issues quickly.
# Personality & Tone
- Friendly, professional, and empathetic
- Speak naturally at a moderate pace
- Keep responses to 2-3 sentences
# Instructions
- Greet callers warmly
- Ask clarifying questions before offering solutions
- Always confirm understanding before proceeding
# Tools
Use the available tools to look up account information and process requests.
# Safety
If a caller becomes aggressive or requests something outside your scope,
politely offer to transfer them to a specialist.

Realtime-specific techniques

Control the model’s speaking pace with explicit instructions:

## Pacing
- Deliver responses at a natural, conversational speed
- Do not rush through information
- Pause briefly between key points

Migration guide

Transitioning from standard STT/TTS to realtime models:

1

Update your model configuration

Change your model to one of the realtime options:

1{
2 "model": {
3 "provider": "openai",
4 "model": "gpt-realtime-2025-08-28" // Changed from gpt-4
5 }
6}
2

Verify voice compatibility

Ensure your selected voice is supported (alloy, echo, shimmer, marin, or cedar)

3

Remove transcriber configuration

Realtime models handle speech-to-speech natively, so transcriber settings are not needed

4

Test function calling

Your existing function configurations work unchanged with realtime models

5

Optimize your prompts

Apply realtime-specific prompting techniques for best results

Best practices

Model selection strategy

Best for production workloads requiring:

  • Structured outputs for form filling or data collection
  • Complex function orchestration
  • Highest quality voice interactions
  • Responses API integration

Best for development and testing:

  • Prototyping voice applications
  • Balanced cost/performance during development
  • Testing conversation flows before production

Best for cost-sensitive applications:

  • High-volume voice interactions
  • Simple Q&A or routing scenarios
  • Applications where latency is critical

Performance optimization

  • Temperature settings: Use 0.5-0.7 for consistent yet natural responses
  • Max tokens: Set appropriate limits (200-300) for conversational responses
  • Voice selection: Test different voices to match your brand personality
  • Function design: Keep function schemas simple for faster execution

Error handling

Handle edge cases gracefully:

1{
2 "messages": [{
3 "role": "system",
4 "content": "If you don't understand the user, politely ask them to repeat. Never make assumptions about unclear requests."
5 }]
6}

Current limitations

Be aware of these limitations when implementing realtime models:

  • Knowledge Bases are not currently supported with the Realtime API
  • Endpointing and Interruption models are managed by Vapi’s orchestration layer
  • Custom voice cloning is not available for realtime models
  • Some OpenAI voices (ash, ballad, coral, fable, onyx, nova) are incompatible
  • Transcripts may have slight differences from traditional STT output

Additional resources

Next steps

Now that you understand OpenAI Realtime models: