Pronunciation dictionaries
Overview
Pronunciation dictionaries allow you to customize how your AI assistant pronounces specific words, names, acronyms, or technical terms. This feature is particularly useful for ensuring consistent pronunciation of brand names, proper nouns, or industry-specific terminology that might be mispronounced by default.
Pronunciation dictionaries are supported by the following voice providers:
- ElevenLabs — phoneme rules (IPA and CMU Arpabet) and alias rules
- Cartesia — “sounds-like” aliases and IPA notation (sonic-3 model only)
- Vapi built-in voices — pronunciation dictionaries via a unified locator (v1 voices only)
Pronunciation dictionaries are supported on Vapi v1 voices only. Vapi v2 voices are powered by xAI’s Grok model, which doesn’t support pronunciation dictionaries yet. For v2 voices, use speech tags to control pronunciation and delivery inline in your text instead.
How Pronunciation Dictionaries Work
Create Pronunciation Rules
Define specific words or phrases and how they should be pronounced using either phonetic notation or word substitutions.
Upload Dictionary to Vapi
Create a pronunciation dictionary through Vapi’s API with your custom rules.
Sample Audio Examples
Below are examples demonstrating the difference between pronunciations with and without pronunciation dictionaries:
Corrected pronunciations:
- “Nginx” → “Engine-X” (using alias rule)
- “Kubernetes” → “/ˌkuːbərˈneɪtiːz/” (using phoneme rule)
Without Pronunciation Dictionary:
With Pronunciation Dictionary:
Prerequisites
- A Vapi assistant configured with an ElevenLabs, Cartesia, or Vapi voice
- For ElevenLabs: understanding of phonetic notation (IPA or CMU Arpabet) for phoneme-based rules
- For Cartesia: the
sonic-3voice model (pronunciation dictionaries are only available on sonic-3) - Access to Vapi’s API for dictionary creation
Types of Pronunciation Rules
ElevenLabs Rules
Phoneme Rules
Phoneme rules specify exact pronunciation using phonetic alphabets. These provide the most precise control over pronunciation.
Supported Alphabets:
- IPA (International Phonetic Alphabet): More universal, uses symbols like
/tə'meɪtoʊ/ - CMU Arpabet: ASCII-based format, uses notation like
T AH M EY T OW
Model Compatibility: Phoneme rules only work with specific ElevenLabs models:
eleven_turbo_v2eleven_flash_v2
Alias Rules
Alias rules replace words with alternative spellings or phrases. These work with all ElevenLabs models and are useful for:
- Converting acronyms to full phrases (e.g., “UN” → “United Nations”)
- Providing phonetic spellings for difficult words
- Standardizing pronunciation across different contexts
Cartesia Rules
Cartesia pronunciation dictionaries use a text and alias format. Each entry maps a word to its pronunciation. Cartesia supports two alias styles:
- Sounds-like guidance: A plain-English hint for how to say the word (e.g.,
"VAH-pee") - IPA notation: Precise phonetic spelling wrapped in angle brackets (e.g.,
"<<ˈ|v|ɑ|ˈ|p|i>>")
Cartesia pronunciation dictionaries are only available with the sonic-3 (or newer) model. In the dashboard, the pronunciation dictionary option only appears once you select a supported model.
Implementation
ElevenLabs
Create a Pronunciation Dictionary
Use Vapi’s API to create a pronunciation dictionary with your custom rules.
The API will respond with:
Cartesia
Create a Pronunciation Dictionary
Use Vapi’s API to create a Cartesia pronunciation dictionary.
The API will respond with a dictionary object containing an id you’ll use in the next step.
Vapi Built-in Voices
Create an ElevenLabs Pronunciation Dictionary
For Vapi v1 built-in voices, create the pronunciation dictionary through the ElevenLabs endpoint shown above. The response includes both a dictionary ID and a versionId; keep both values.
Vapi built-in voices use a unified pronunciationDictionary field, but most v1 built-in voices route through ElevenLabs under the hood. Do not assume a Cartesia dictionary ID works with every Vapi built-in voice. Cartesia dictionary IDs only apply to Vapi voices that are routed to a Cartesia Sonic voice path.
Using Your Own ElevenLabs Account (BYOK)
If you’re using your own ElevenLabs API key (Bring Your Own Key), you can create pronunciation dictionaries directly in your ElevenLabs account and reference them in Vapi:
- Create a pronunciation dictionary in your ElevenLabs account
- Note the
pronunciationDictionaryIdandversionIdfrom ElevenLabs - Use these IDs in your Vapi assistant configuration:
Managing Pronunciation Dictionaries
ElevenLabs
List Your Dictionaries
Update Dictionary Rules
Cartesia
List Your Dictionaries
Update Dictionary Items
Best Practices
- Case Sensitivity: Pronunciation dictionary searches are case-sensitive. Create separate entries for different capitalizations if needed.
- Order Matters: Rules are applied in the order they appear in the dictionary. The first matching rule is used.
- Testing: Always test pronunciation changes with your specific voice and model combination.
- Phoneme Accuracy: Ensure proper stress marking for multi-syllable words when using phoneme rules.
- Model Compatibility: ElevenLabs phoneme rules only work with
eleven_turbo_v2andeleven_flash_v2. Cartesia pronunciation dictionaries require thesonic-3model.
Common Issues
Pronunciation Not Applied
- Verify you’re using a compatible model (ElevenLabs phoneme rules need specific models; Cartesia needs
sonic-3) - Check that the word to replace exactly matches the text in your content (case-sensitive)
- Ensure the pronunciation dictionary is properly referenced in your voice configuration
SSML Conflicts
- When pronunciation dictionaries are enabled, SSML parsing is automatically activated
- Ensure any existing SSML tags in your content are properly formatted
Performance Impact
- Large dictionaries may slightly increase processing time
- Consider organizing rules by frequency of use for optimal performance