Email address reading
Overview
Email addresses are one of the trickiest pieces of information for a voice agent to handle. They contain special characters (@, ., -, _), mixed-case text, and domain names that text-to-speech (TTS) engines often mispronounce or blur together when spoken aloud.
This guide covers three layers of the solution:
- Built-in formatting — Vapi automatically transforms email characters for TTS so they sound natural, with zero configuration.
- API configuration — You can fine-tune which formatters run, disable email formatting selectively, or customize the entire formatting pipeline.
- Prompt engineering — You instruct the LLM how to collect, read back, and confirm emails in conversation so users feel confident their address was captured correctly.
How Vapi handles emails automatically
Vapi’s voice formatting plan runs a 14-step pipeline that transforms raw LLM text into natural-sounding speech before it reaches the TTS provider. The email formatter is step 8 in this pipeline. It replaces @ with “at” and . with “dot” so the spoken output is intelligible without any prompt changes.
The email formatter is enabled by default. You do not need to configure anything for basic email reading to work.
Where email formatting fits in the pipeline
The formatter runs after acronym and dollar-amount formatting, and before date, time, and phone number formatting. Here is the full 14-step pipeline with the email step highlighted:
For full details on every step, see the voice formatting plan reference.
Configuring email formatting via the API
The email formatter runs automatically with no configuration needed. However, you can customize the behavior through the formatPlan on your assistant’s voice configuration.
Configuration path
The relevant TypeScript type:
The formattersEnabled array accepts any combination of these values: removeAngleBrackets, markdown, asterisk, newline, colon, acronym, dollarAmount, email, date, time, distance, unit, percentage, phoneNumber, number, stripAsterisk.
The formattersEnabled property was introduced on 2025-02-20. Before that date, you could only toggle all formatting on or off with the enabled flag. If you are using an older API version, use enabled: false to disable all formatting.
Default behavior (no configuration needed)
By default, all formatters — including email — are enabled. You do not need to set anything for email addresses to be read correctly:
Enable only specific formatters
If you want tight control over which transformations run, pass only the formatter keys you need. This example enables only the email and phoneNumber formatters:
When you set formattersEnabled, only the listed formatters run. All others are disabled. Make sure to include every formatter you need.
Disable email formatting while keeping all others
Omit email from the formattersEnabled array. The TTS provider will then receive the raw @ and . characters, and pronunciation depends entirely on the provider and your prompt:
Disable all formatting
To send raw LLM output directly to TTS with no transformations at all:
Disabling all formatting means numbers, currencies, dates, phone numbers, and emails will all be sent raw to the TTS provider. Most providers will produce unnatural or garbled speech for these patterns.
Why prompt engineering still matters
Even though TTS formatting handles the character-level pronunciation, the LLM still controls how the conversation flows. Without explicit instructions, the agent might:
- Read the email once at normal speed and move on, leaving the user unsure.
- Fail to spell out ambiguous parts (was it “Jon” or “John”?).
- Mispronounce uncommon domain names.
- Skip a confirmation step entirely.
Good prompt instructions solve these problems at the conversational level.
System prompt: collecting an email
When asking a user for their email, instruct the agent to be patient and explicit about what it needs. The following snippet can be added to your system prompt.
System prompt: reading back and confirming an email
The confirmation step is where most agents fail. They read the email too fast or only once. This snippet teaches the agent to slow down and spell when needed.
Spelling out letter by letter
For ambiguous usernames or unfamiliar domains, letter-by-letter spelling removes all doubt. Add this instruction to your prompt so the agent knows when and how to spell.
Most email providers treat addresses as case-insensitive, so you typically do not need to distinguish uppercase from lowercase. Your prompt can note this to keep the conversation simpler.
Handling common domains naturally
You can make the agent sound more natural by teaching it to recognize popular email domains and say them as single words rather than spelling them out.
Complete example: appointment booking agent
Below is a full system prompt section you can copy into your assistant configuration. It combines all the techniques above into a single, production-ready block.
Including an example conversation in your system prompt helps the LLM understand the exact pacing and format you expect. This is one of the most effective techniques for consistent behavior.
Using pronunciation dictionaries for domains
If your agents frequently encounter a specific company or domain name that TTS mispronounces, you can use pronunciation dictionaries (available with ElevenLabs voices) to set the correct pronunciation at the TTS level.
For example, if the domain “vapi.ai” is being pronounced as “vappy dot ay-eye”, you could create an alias rule:
This approach is complementary to prompt engineering — pronunciation dictionaries fix TTS-level pronunciation, while prompt instructions control the conversational flow.
Using custom keywords for transcription accuracy
If the speech-to-text (STT) transcriber is mishearing specific email domains or usernames, custom keywords can boost transcription accuracy for those terms.
For example, if users frequently mention their company email domain “contoso.com” and the transcriber misinterprets it, you can add “contoso” as a custom keyword to improve recognition.
Best practices
Always confirm the full email address
Never assume an email is correct after hearing it once. Always read the complete email back and wait for confirmation before proceeding. This single step prevents the majority of email capture errors.
Use a two-pass approach for difficult emails
First, try reading the email back naturally (words and common domains). If the user says it is wrong, switch to letter-by-letter spelling for the entire address. This keeps simple emails fast while still handling complex ones reliably.
Do not autocorrect or assume
Instruct the agent to never modify any part of the email address. Common mistakes include changing “jon” to “john” or assuming “.com” when the user said “.co”. Treat the email as an exact string.
Handle interruptions gracefully
Users sometimes interrupt mid-readback with a correction. Instruct the agent to accept the correction, incorporate it, and then restart the full readback from the beginning so both parties are aligned.
Keep voice formatting enabled
Vapi’s built-in formatEmails transformer handles the TTS-level
conversion of ”@” and ”.” automatically. Disabling the voice formatting
plan will cause the TTS to receive raw characters, which may produce
garbled output. Keep voice.chunkPlan.formatPlan.enabled set to true
(the default).
Common issues
TTS reads the email as a URL or gibberish
This usually happens when voice formatting is disabled. Verify that
voice.chunkPlan.formatPlan.enabled is set to true (the default).
See the voice formatting plan for
details.
Agent skips the confirmation step
Add an explicit instruction like “Do not proceed until the user confirms the email” to your system prompt. Reinforcing this with an example conversation in the prompt helps the LLM follow the flow consistently.
Agent modifies or autocorrects the email
LLMs sometimes try to be helpful by fixing perceived typos. Add a clear rule: “Never modify, autocorrect, or guess any part of the email address. Use exactly what the user provides.”
User says a letter but transcriber hears a different one
Letters like “b” and “d”, or “m” and “n”, sound similar over phone audio. If this happens frequently, instruct the agent to ask the user to use the NATO phonetic alphabet (“b as in bravo”) or use custom keywords to improve transcription accuracy for commonly confused terms.
Next steps
Now that your agent handles email addresses reliably:
- Voice formatting plan — Full reference for all 14 formatting steps and customization options.
- Prompting guide — General techniques for writing effective voice AI prompts.
- Pronunciation dictionaries — Fine-tune TTS pronunciation for specific words and names.
- Custom keywords — Improve transcription accuracy for specific terms.
- Speech configuration — Configure endpointing, silence detection, and other speech settings.