Voice formatting plan
Format LLM output for natural-sounding speech
Overview
Voice formatting automatically transforms raw text from your language model (LLM) into a format that sounds natural when spoken by a text-to-speech (TTS) provider. This process—called Voice Input Formatted—is enabled by default for all assistants.
Formatting helps with things like:
- Expanding numbers and currency (e.g.,
$42.50
→ “forty two dollars and fifty cents”) - Expanding abbreviations (e.g.,
ST
→ “STREET”) - Spacing out phone numbers (e.g.,
123-456-7890
→ “1 2 3 4 5 6 7 8 9 0”)
You can turn off formatting if you want the TTS to read the raw LLM output.
How voice input formatting works
When enabled, the formatter runs a series of transformations on your text, each handled by a specific function. Here’s the order and what each function does:
Customizing the formatting plan
You can control some aspects of formatting:
Enabled
Formatting is on by default. To disable, set:
Number-to-digits cutoff
Controls when numbers are read as digits instead of words.
- Default:
2025
(current year) - Example: With a cutoff of
2025
, numbers above this are read as digits. - To spell out larger numbers, set the cutoff higher (e.g.,
300000
).
Replacements
Add exact or regex-based substitutions to customize output.
- Example 1: Replace
hello
withhi
: - Example 2: Replace words matching a pattern:
Currently, only replacements and the number-to-digits cutoff are customizable. Other options are not exposed.
Turning formatting off
To disable all formatting and use raw LLM output, set either of these to false
:
Summary
- Voice input formatting improves clarity and naturalness for TTS.
- Each transformation step targets a specific pattern for better speech output.
- You can customize or disable formatting as needed.