For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
WebsiteStatusSupportDashboard
DocumentationAPI ReferenceMCPSDKsCLI (new)What's New?
DocumentationAPI ReferenceMCPSDKsCLI (new)What's New?
  • Get started
    • Introduction
    • Phone calls
    • Web calls
    • Vapi Guides
    • Composer
    • CLI quickstart
  • Assistants
    • Quickstart
      • Variables
      • Multilingual support
      • Personalization with user information
      • Voice formatting plan
      • Flush syntax
      • Background messages
      • Idle messages
      • Assistant hooks
      • Background speech denoising
      • Pronunciation dictionaries
      • Email address reading
    • Tools
    • Custom keywords
    • Custom voices
    • Custom transcriber
    • Custom TTS
  • Observability
    • Boards
  • Squads
    • Quickstart
    • Overview
    • Handoff tool
    • Passing data between assistants
  • Best practices
    • Prompting guide
    • Debugging voice agents
    • Enterprise environments (DEV/UAT/PROD)
    • IVR navigation
  • Phone numbers
    • Free Vapi number
    • Inbound SMS
    • Phone Number Hooks
  • Calls
    • Call end reasons
    • Troubleshoot call errors
  • Outbound Campaigns
    • Quickstart
    • Overview
  • Chat
    • Quickstart
    • Streaming
    • Non-streaming
    • OpenAI compatibility
    • Session management
    • Variable substitution
    • SMS chat
    • Web widget
    • Webhooks
  • Workflows
    • Quickstart
    • Overview
LogoLogo
WebsiteStatusSupportDashboard
On this page
  • Overview
  • How Pronunciation Dictionaries Work
  • Sample Audio Examples
  • Prerequisites
  • Types of Pronunciation Rules
  • ElevenLabs Rules
  • Phoneme Rules
  • Alias Rules
  • Cartesia Rules
  • Implementation
  • ElevenLabs
  • Cartesia
  • Vapi Built-in Voices
  • Using Your Own ElevenLabs Account (BYOK)
  • Managing Pronunciation Dictionaries
  • ElevenLabs
  • List Your Dictionaries
  • Update Dictionary Rules
  • Cartesia
  • List Your Dictionaries
  • Update Dictionary Items
  • Best Practices
  • Common Issues
AssistantsConversation behavior

Pronunciation dictionaries

Control how your AI assistant pronounces specific words and phrases
Was this page helpful?
Edit this page
Previous

Email address reading

Get your voice agent to collect, read back, and confirm email addresses clearly
Next
Built with

Overview

Pronunciation dictionaries allow you to customize how your AI assistant pronounces specific words, names, acronyms, or technical terms. This feature is particularly useful for ensuring consistent pronunciation of brand names, proper nouns, or industry-specific terminology that might be mispronounced by default.

Pronunciation dictionaries are supported by the following voice providers:

  • ElevenLabs — phoneme rules (IPA and CMU Arpabet) and alias rules
  • Cartesia — “sounds-like” aliases and IPA notation (sonic-3 model only)
  • Vapi built-in voices — pronunciation dictionaries via a unified locator

How Pronunciation Dictionaries Work

1

Create Pronunciation Rules

Define specific words or phrases and how they should be pronounced using either phonetic notation or word substitutions.

2

Upload Dictionary to Vapi

Create a pronunciation dictionary through Vapi’s API with your custom rules.

3

Configure Your Assistant

Associate the pronunciation dictionary with your assistant’s voice configuration.

4

Automatic Application

When your assistant encounters the specified words during conversation, it will use your custom pronunciations automatically.

Sample Audio Examples

Below are examples demonstrating the difference between pronunciations with and without pronunciation dictionaries:

Corrected pronunciations:

  • “Nginx” → “Engine-X” (using alias rule)
  • “Kubernetes” → “/ˌkuːbərˈneɪtiːz/” (using phoneme rule)

Without Pronunciation Dictionary: Your browser does not support the audio element.

With Pronunciation Dictionary: Your browser does not support the audio element.

Prerequisites

  • A Vapi assistant configured with an ElevenLabs, Cartesia, or Vapi voice
  • For ElevenLabs: understanding of phonetic notation (IPA or CMU Arpabet) for phoneme-based rules
  • For Cartesia: the sonic-3 voice model (pronunciation dictionaries are only available on sonic-3)
  • Access to Vapi’s API for dictionary creation

Types of Pronunciation Rules

ElevenLabs Rules

Phoneme Rules

Phoneme rules specify exact pronunciation using phonetic alphabets. These provide the most precise control over pronunciation.

Supported Alphabets:

  • IPA (International Phonetic Alphabet): More universal, uses symbols like /tə'meɪtoʊ/
  • CMU Arpabet: ASCII-based format, uses notation like T AH M EY T OW

Model Compatibility: Phoneme rules only work with specific ElevenLabs models:

  • eleven_turbo_v2
  • eleven_flash_v2

Alias Rules

Alias rules replace words with alternative spellings or phrases. These work with all ElevenLabs models and are useful for:

  • Converting acronyms to full phrases (e.g., “UN” → “United Nations”)
  • Providing phonetic spellings for difficult words
  • Standardizing pronunciation across different contexts

Cartesia Rules

Cartesia pronunciation dictionaries use a text and alias format. Each entry maps a word to its pronunciation. Cartesia supports two alias styles:

  • Sounds-like guidance: A plain-English hint for how to say the word (e.g., "VAH-pee")
  • IPA notation: Precise phonetic spelling wrapped in angle brackets (e.g., "<<ˈ|v|ɑ|ˈ|p|i>>")

Cartesia pronunciation dictionaries are only available with the sonic-3 model.

Implementation

ElevenLabs

1

Create a Pronunciation Dictionary

Use Vapi’s API to create a pronunciation dictionary with your custom rules.

$POST https://api.vapi.ai/provider/11labs/pronunciation-dictionary
$Content-Type: application/json
$Authorization: Bearer YOUR_API_KEY
1{
2 "name": "My Custom Dictionary",
3 "rules": [
4 {
5 "stringToReplace": "tomato",
6 "type": "phoneme",
7 "phoneme": "/tə'meɪtoʊ/",
8 "alphabet": "ipa"
9 },
10 {
11 "stringToReplace": "Vapi",
12 "type": "phoneme",
13 "phoneme": "V AE P IY",
14 "alphabet": "cmu-arpabet"
15 },
16 {
17 "stringToReplace": "UN",
18 "type": "alias",
19 "alias": "United Nations"
20 }
21 ]
22}

The API will respond with:

1{
2 "pronunciationDictionaryId": "rjshI10OgN6KxqtJBqO4",
3 "versionId": "xJl0ImZzi3cYp61T0UQG",
4 "name": "My Custom Dictionary",
5 "rules": [...],
6 "createdAt": "2024-01-15T10:30:00Z"
7}
2

Configure Your Assistant's Voice

Update your assistant configuration to use the pronunciation dictionary.

1{
2 "voice": {
3 "model": "eleven_turbo_v2_5",
4 "voiceId": "sarah",
5 "provider": "11labs",
6 "stability": 0.5,
7 "similarityBoost": 0.75,
8 "pronunciationDictionaryLocators": [
9 {
10 "pronunciationDictionaryId": "rjshI10OgN6KxqtJBqO4",
11 "versionId": "xJl0ImZzi3cYp61T0UQG"
12 }
13 ]
14 }
15}

When a pronunciation dictionary is added, SSML parsing will be automatically enabled for your assistant.

3

Test Your Pronunciation

Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.

Cartesia

1

Create a Pronunciation Dictionary

Use Vapi’s API to create a Cartesia pronunciation dictionary.

$POST https://api.vapi.ai/provider/cartesia/pronunciation-dictionary
$Content-Type: application/json
$Authorization: Bearer YOUR_API_KEY
1{
2 "name": "My Cartesia Dictionary",
3 "items": [
4 {
5 "text": "Vapi",
6 "alias": "VAH-pee"
7 },
8 {
9 "text": "Nginx",
10 "alias": "Engine-X"
11 },
12 {
13 "text": "GIF",
14 "alias": "<<ˈ|dʒ|ɪ|f>>"
15 }
16 ]
17}

The API will respond with a dictionary object containing an id you’ll use in the next step.

2

Configure Your Assistant's Voice

Add the pronunciation dictionary ID to your Cartesia voice configuration.

1{
2 "voice": {
3 "model": "sonic-3",
4 "voiceId": "your-cartesia-voice-id",
5 "provider": "cartesia",
6 "pronunciationDictId": "dict_abc123"
7 }
8}
3

Test Your Pronunciation

Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.

Vapi Built-in Voices

1

Create a Pronunciation Dictionary

Create a pronunciation dictionary using either the ElevenLabs or Cartesia API endpoints shown above. The dictionary ID from either provider can be used with Vapi built-in voices.

2

Configure Your Assistant's Voice

Add the pronunciation dictionary locator to your Vapi voice configuration.

1{
2 "voice": {
3 "voiceId": "Elliot",
4 "provider": "vapi",
5 "pronunciationDictionary": [
6 {
7 "pronunciationDictId": "pdict_abc123"
8 }
9 ]
10 }
11}

The versionId field is optional for Vapi voices. It is only required when referencing an ElevenLabs-backed dictionary.

3

Test Your Pronunciation

Create a test call or use the Vapi playground to verify that your custom pronunciations are working correctly.

Using Your Own ElevenLabs Account (BYOK)

If you’re using your own ElevenLabs API key (Bring Your Own Key), you can create pronunciation dictionaries directly in your ElevenLabs account and reference them in Vapi:

  1. Create a pronunciation dictionary in your ElevenLabs account
  2. Note the pronunciationDictionaryId and versionId from ElevenLabs
  3. Use these IDs in your Vapi assistant configuration:
1{
2 "voice": {
3 "model": "eleven_turbo_v2_5",
4 "voiceId": "your-voice-id",
5 "provider": "11labs",
6 "pronunciationDictionaryLocators": [
7 {
8 "pronunciationDictionaryId": "your-elevenlabs-dict-id",
9 "versionId": "your-elevenlabs-version-id"
10 }
11 ]
12 }
13}

Managing Pronunciation Dictionaries

ElevenLabs

List Your Dictionaries

$GET https://api.vapi.ai/provider/11labs/pronunciation-dictionary
$Authorization: Bearer YOUR_API_KEY

Update Dictionary Rules

$PATCH https://api.vapi.ai/provider/11labs/pronunciation-dictionary/{dictionaryId}
$Content-Type: application/json
$Authorization: Bearer YOUR_API_KEY
1{
2 "rules": [
3 {
4 "stringToReplace": "tomato",
5 "type": "phoneme",
6 "phoneme": "/tə'mɑːtoʊ/",
7 "alphabet": "ipa"
8 }
9 ]
10}

Cartesia

List Your Dictionaries

$GET https://api.vapi.ai/provider/cartesia/pronunciation-dictionary
$Authorization: Bearer YOUR_API_KEY

Update Dictionary Items

$PATCH https://api.vapi.ai/provider/cartesia/pronunciation-dictionary/{dictionaryId}
$Content-Type: application/json
$Authorization: Bearer YOUR_API_KEY
1{
2 "items": [
3 {
4 "text": "Vapi",
5 "alias": "VAH-pee"
6 }
7 ]
8}

Best Practices

  • Case Sensitivity: Pronunciation dictionary searches are case-sensitive. Create separate entries for different capitalizations if needed.
  • Order Matters: Rules are applied in the order they appear in the dictionary. The first matching rule is used.
  • Testing: Always test pronunciation changes with your specific voice and model combination.
  • Phoneme Accuracy: Ensure proper stress marking for multi-syllable words when using phoneme rules.
  • Model Compatibility: ElevenLabs phoneme rules only work with eleven_turbo_v2 and eleven_flash_v2. Cartesia pronunciation dictionaries require the sonic-3 model.

Common Issues

Pronunciation Not Applied

  • Verify you’re using a compatible model (ElevenLabs phoneme rules need specific models; Cartesia needs sonic-3)
  • Check that the word to replace exactly matches the text in your content (case-sensitive)
  • Ensure the pronunciation dictionary is properly referenced in your voice configuration

SSML Conflicts

  • When pronunciation dictionaries are enabled, SSML parsing is automatically activated
  • Ensure any existing SSML tags in your content are properly formatted

Performance Impact

  • Large dictionaries may slightly increase processing time
  • Consider organizing rules by frequency of use for optimal performance