Voice Formatting Plan
What is Voice Input Formatted?
When interacting with voice assistants, you might notice terms like Voice Input Formatted
in call logs or system outputs. This article explains what this means, how it works, and why it’s important for delivering clear and natural voice interactions.
Voice Input Formatted is a function that takes raw text from a language model (LLM) and cleans it up so text-to-speech (TTS) provider can read it more naturally. It’s on by default in your assistant’s voice provider settings, because it helps turn things like:
$42.50
→forty two dollars and fifty cents
ST
→STREET
,- or phone numbers → spaced digits (“1 2 3 4 5 6 7 8 9 0”).
If you prefer the raw, unchanged text, you can turn off these transformations, which we’ll show you later.
Log Example
1. Step-by-Step Transformations
When Voice Input Formatted
runs, it calls a bunch of helper functions in a row. Each one focuses on a different kind of text pattern. The entire process happens in this order:
- removeAngleBracketContent
- removeMarkdownSymbols
- removePhrasesInAsterisks
- replaceNewLinesWithPeriods
- replaceColonsWithPeriods
- formatAcronyms
- formatDollarAmounts
- formatEmails
- formatDates
- formatTimes
- formatDistances, formatUnits, formatPercentages, formatPhoneNumbers
- formatNumbers
- Applying Replacements
We’ll walk you through them using a shorter example than before.
1.1 Our Simpler Example Input
1.2 removeAngleBracketContent
- What it does: Removes
<anything>
unless it’s<break>
,<spell>
, or double angle brackets<< >>
. - Example effect:
<tag>
gets removed.
Result so far:
1.3 removeMarkdownSymbols
- What it does: Removes
_
, ```, or~
. Some versions also remove double asterisks, but that might happen in a later step (next function).
In this example, there’s **Wanted**
, which might remain if we strictly only remove _
, backticks, and tildes. If the code does remove **
as well, it’ll vanish here or in the next step. Let’s assume it doesn’t remove them in this step.
Result: _No real change if the code only targets _
, ```, and ~
._
1.4 removePhrasesInAsterisks
- What it does: Looks for
some text*
or*some text**
and cuts it out.
In our text, we have **Wanted**
and *hi*
. Both get removed if the function is broad enough to remove single and double-asterisk blocks.
Result:
1.5 replaceNewLinesWithPeriods
- What it does: Turns line breaks into
.
or.
and merges repeated periods.
Let’s say the above text has line breaks. After this step, it’s more of a single line (or fewer lines), each newline replaced by a period.
Result (roughly):
1.6 replaceColonsWithPeriods
- What it does:
:
→.
Our text has price: $42.50
. That becomes price. $42.50
.
Result:
1.7 formatAcronyms
- What it does:
- If something is in a known “to-lower” list (like
NASA
,.NET
), it becomes lowercase (nasa
,.net
). - If it’s all-caps but not recognized, it might get spaced letters. If it has vowels, it’s left alone.
- If something is in a known “to-lower” list (like
In the example:
NASA
→nasa
.NET
→.net
1.8 formatDollarAmounts
- What it does:
$42.50
→ “forty two dollars and fifty cents.”
1.9 formatEmails
- What it does: Replaces
@
with “ at ” and.
with “ dot ” in emails. [email protected]
→JOHN dot DOE at example dot COM
1.10 formatDates
- What it does:
YYYY MM DD
→ e.g. “Wednesday, May 10, 2023” (if valid). 2023 05 10
become “Wednesday, May 10, 2023” (day name depends on how the code calculates it).
1.11 formatTimes
- What it does:
14:00
→14
(since minutes are “00,” it remove them). - If it was
14:30
, it might become14 30
.
1.12 formatDistances, formatUnits, formatPercentages, formatPhoneNumbers
- Distances:
5km
→ “5 kilometers.” - Units: e.g.
43 lb
→ “forty three pounds.” - Percentages:
50%
→ “50 percent.” - PhoneNumbers:
123-456-7890
→1 2 3 4 5 6 7 8 9 0
.
1.13 formatNumbers
- What it does:
- Skips year-like numbers if they’re below current year(2025).
- For large numbers above a cutoff (e.g. 1000 or 5000), it reads as digits.
- Negative numbers:
9
→ “minus nine.” - Decimals:
2.5
→ “two point five.”
In our case, 9999
might be big enough to become spelled out (nine thousand nine hundred ninety nine) or digits spaced out, depending on the cutoff.
2023
used with 05 10
might get turned into a date, so it’s handled by the date logic, not the plain number logic.
1.14 Applying Replacements (street-suffix expansions)
- Runs last. If you have user-defined replacements like
\bST\b
→STREET
,\bRD\b
→ROAD
, it changes them after all the other steps. - So
320 ST 21 RD
→320 STREET 21 ROAD
.
End Result: A single line of text with all the helpful expansions and transformations done.
2. Formatting Plan: Customization Options
The Formatting Plan governs how Voice Input Formatted works. Here are the main settings you can customize:
2.1 Enabled
Determines whether the formatting is applied.
- Default:
true
- To disable: Set
voice.chunkPlan.formatPlan.enabled = false
.
2.2 Number-to-Digits Cutoff
This decides when numbers are read as digits instead of words.
- Default:
2025
(current year). - The code generally doesn’t convert numbers below the current year (like
2025
) into spelled-out words, so it stays as digits if it’s obviously a year. - If a number is bigger than the cutoff (
numberToDigitsCutoff
), it reads digits out loud. - Negative numbers become “minus,” decimals get “point,” etc.
- Example: With a cutoff of
2025
, numbers like12345
will remain digits. - To ensure larger numbers are spelled out, set the cutoff higher, like
300000
. For example:30003
→ “thirty thousand and three” (with a cutoff of300000
).
2.3 Replacements
Allows exact or regex-based substitutions in text.
- Example 1: Replace
hello
withhi
:{ type: 'exact', key: 'hello', value: 'hi' }
. - Example 2: Replace words matching a pattern:
{ type: 'regex', regex: '\\\\b[a-zA-Z]{5}\\\\b', value: 'hi' }
.
Note
Currently, only replacements and number-to-digits cutoff are exposed for customization. Other options, such as toggling acronym replacement, are not exposed to be toggled.
3. How to Turn It Off
By default, the entire pipeline is on because it helps TTS read better. To turn it off, set:
Any of those flags being false
means we skip calling Voice Input Formatted
.
4. Conclusion
Voice Input Formatted
orchestrates a chain of mini-functions that together fix punctuation, expand abbreviations, and make text more readable out loud.- You can keep it on for better TTS results or off if you need the raw LLM output.
- The final transformations, especially the user-supplied replacements (like street expansions), happen last, so keep that in mind it rely on other expansions earlier.