Voice Formatting Plan

What is Voice Input Formatted?

When interacting with voice assistants, you might notice terms like Voice Input Formatted in call logs or system outputs. This article explains what this means, how it works, and why it’s important for delivering clear and natural voice interactions.

Voice Input Formatted is a function that takes raw text from a language model (LLM) and cleans it up so text-to-speech (TTS) provider can read it more naturally. It’s on by default in your assistant’s voice provider settings, because it helps turn things like:

  • $42.50forty two dollars and fifty cents
  • STSTREET,
  • or phone numbers → spaced digits (“1 2 3 4 5 6 7 8 9 0”).

If you prefer the raw, unchanged text, you can turn off these transformations, which we’ll show you later.

Log Example

Screenshot 2025-01-21 at 10.23.19.png

1. Step-by-Step Transformations

When Voice Input Formatted runs, it calls a bunch of helper functions in a row. Each one focuses on a different kind of text pattern. The entire process happens in this order:

  1. removeAngleBracketContent
  2. removeMarkdownSymbols
  3. removePhrasesInAsterisks
  4. replaceNewLinesWithPeriods
  5. replaceColonsWithPeriods
  6. formatAcronyms
  7. formatDollarAmounts
  8. formatEmails
  9. formatDates
  10. formatTimes
  11. formatDistances, formatUnits, formatPercentages, formatPhoneNumbers
  12. formatNumbers
  13. Applying Replacements

We’ll walk you through them using a shorter example than before.

1.1 Our Simpler Example Input

Hello <tag> world
**Wanted** to say *hi*
We have NASA and .NET here,
call me at 123-456-7890,
price: $42.50
and the date is 2023 05 10
and time is 14:00
Distance is 5km
We might see 9999
the address is 320 ST 21 RD
my email is [email protected]

1.2 removeAngleBracketContent

  • What it does: Removes <anything> unless it’s <break>, <spell>, or double angle brackets << >>.
  • Example effect: <tag> gets removed.

Result so far:

Hello world
**Wanted** to say *hi*
We have NASA and .NET here,
call me at 123-456-7890,
price: $42.50
and the date is 2023 05 10
and time is 14:00
Distance is 5km
We might see 9999
the address is 320 ST 21 RD
my email is [email protected]

1.3 removeMarkdownSymbols

  • What it does: Removes _, ```, or ~. Some versions also remove double asterisks, but that might happen in a later step (next function).

In this example, there’s **Wanted**, which might remain if we strictly only remove _, backticks, and tildes. If the code does remove ** as well, it’ll vanish here or in the next step. Let’s assume it doesn’t remove them in this step.

Result: _No real change if the code only targets _ , ```, and ~._

Hello world
**Wanted** to say *hi*
...

1.4 removePhrasesInAsterisks

  • What it does: Looks for some text* or *some text** and cuts it out.

In our text, we have **Wanted** and *hi*. Both get removed if the function is broad enough to remove single and double-asterisk blocks.

Result:

Hello world
to say
We have NASA and .NET here,
call me at 123-456-7890,
price: $42.50
and the date is 2023 05 10
and time is 14:00
Distance is 5km
We might see 9999
the address is 320 ST 21 RD
my email is [email protected]

1.5 replaceNewLinesWithPeriods

  • What it does: Turns line breaks into . or . and merges repeated periods.

Let’s say the above text has line breaks. After this step, it’s more of a single line (or fewer lines), each newline replaced by a period.

Result (roughly):

Hello world . to say . We have NASA and .NET here, call me at 123-456-7890, price: $42.50 and the date is 2023 05 10 and time is 14:00 Distance is 5km We might see 9999 the address is 320 ST 21 RD my email is [email protected]

1.6 replaceColonsWithPeriods

  • What it does: :.

Our text has price: $42.50. That becomes price. $42.50.

Result:

Hello world . to say . We have NASA and .NET here, call me at 123-456-7890, price. $42.50 ...

1.7 formatAcronyms

  • What it does:
    • If something is in a known “to-lower” list (like NASA, .NET), it becomes lowercase (nasa, .net).
    • If it’s all-caps but not recognized, it might get spaced letters. If it has vowels, it’s left alone.

In the example:

  • NASAnasa
  • .NET.net

1.8 formatDollarAmounts

  • What it does: $42.50 → “forty two dollars and fifty cents.”

1.9 formatEmails

  • What it does: Replaces @ with “ at ” and . with “ dot ” in emails.
  • [email protected]JOHN dot DOE at example dot COM

1.10 formatDates

  • What it does: YYYY MM DD → e.g. “Wednesday, May 10, 2023” (if valid).
  • 2023 05 10 become “Wednesday, May 10, 2023” (day name depends on how the code calculates it).

1.11 formatTimes

  • What it does: 14:0014 (since minutes are “00,” it remove them).
  • If it was 14:30, it might become 14 30.

1.12 formatDistances, formatUnits, formatPercentages, formatPhoneNumbers

  • Distances: 5km → “5 kilometers.”
  • Units: e.g. 43 lb → “forty three pounds.”
  • Percentages: 50% → “50 percent.”
  • PhoneNumbers: 123-456-78901 2 3 4 5 6 7 8 9 0.

1.13 formatNumbers

  • What it does:
    • Skips year-like numbers if they’re below current year(2025).
    • For large numbers above a cutoff (e.g. 1000 or 5000), it reads as digits.
    • Negative numbers: 9 → “minus nine.”
    • Decimals: 2.5 → “two point five.”

In our case, 9999 might be big enough to become spelled out (nine thousand nine hundred ninety nine) or digits spaced out, depending on the cutoff.

2023 used with 05 10 might get turned into a date, so it’s handled by the date logic, not the plain number logic.

1.14 Applying Replacements (street-suffix expansions)

  • Runs last. If you have user-defined replacements like \bST\bSTREET, \bRD\bROAD, it changes them after all the other steps.
  • So 320 ST 21 RD320 STREET 21 ROAD.

End Result: A single line of text with all the helpful expansions and transformations done.

2. Formatting Plan: Customization Options

The Formatting Plan governs how Voice Input Formatted works. Here are the main settings you can customize:

2.1 Enabled

Determines whether the formatting is applied.

  • Default: true
  • To disable: Set voice.chunkPlan.formatPlan.enabled = false.

2.2 Number-to-Digits Cutoff

This decides when numbers are read as digits instead of words.

  • Default: 2025 (current year).
  • The code generally doesn’t convert numbers below the current year (like 2025) into spelled-out words, so it stays as digits if it’s obviously a year.
  • If a number is bigger than the cutoff (numberToDigitsCutoff), it reads digits out loud.
  • Negative numbers become “minus,” decimals get “point,” etc.
  • Example: With a cutoff of 2025, numbers like 12345 will remain digits.
  • To ensure larger numbers are spelled out, set the cutoff higher, like 300000. For example:
    • 30003 → “thirty thousand and three” (with a cutoff of 300000).

2.3 Replacements

Allows exact or regex-based substitutions in text.

  • Example 1: Replace hello with hi:{ type: 'exact', key: 'hello', value: 'hi' }.
  • Example 2: Replace words matching a pattern:{ type: 'regex', regex: '\\\\b[a-zA-Z]{5}\\\\b', value: 'hi' }.

Note

Currently, only replacements and number-to-digits cutoff are exposed for customization. Other options, such as toggling acronym replacement, are not exposed to be toggled.

3. How to Turn It Off

By default, the entire pipeline is on because it helps TTS read better. To turn it off, set:

voice.chunkPlan.enabled = false;
// or
voice.chunkPlan.formatPlan.enabled = false;

Any of those flags being false means we skip calling Voice Input Formatted.

4. Conclusion

  • Voice Input Formatted orchestrates a chain of mini-functions that together fix punctuation, expand abbreviations, and make text more readable out loud.
  • You can keep it on for better TTS results or off if you need the raw LLM output.
  • The final transformations, especially the user-supplied replacements (like street expansions), happen last, so keep that in mind it rely on other expansions earlier.
Built with