September 11, 2025

Voice Enhancements & Minimax Improvements

Minimax Voice Language Support: Enhance multilingual conversations with MinimaxVoice.languageBoost. Support for 40+ languages including:
- Chinese and Chinese,Yue for Mandarin and Cantonese
- English, Spanish, French, German, Japanese, Korean
- Regional variants and specialized languages like Arabic, Hindi, Thai
- auto mode for automatic language detection
Text Normalization: Improve number reading and formatting with MinimaxVoice.textNormalizationEnabled. When enabled, spoken numbers, dates, and formatted text are properly pronounced for natural-sounding conversations.
Enhanced Voice Caching: Voice responses are now cached by default with MinimaxVoice.cachingEnabled set to true, reducing latency for repeated phrases and improving overall conversation performance.
Fallback Voice Configuration: Ensure conversation continuity with FallbackMinimaxVoice featuring the same language boost and text normalization capabilities as the primary voice configuration.
Speaker Labeling: Track multiple speakers in conversations with BotMessage.speakerLabel, providing stable speaker identification (e.g., “Speaker 1”) for better conversation analysis and diarization.
Voice Region Support: Choose optimal performance regions with Minimax’s worldwide (default) or china regional settings for better latency and compliance with local regulations.

Language boost settings help the text-to-speech model better understand context and pronunciation for specific languages, resulting in more natural and accurate voice synthesis.

Voice Quality Features

Multilingual Support

Support for 40+ languages with automatic detection and language-specific optimizations for natural pronunciation.

Smart Text Processing

Intelligent normalization of numbers, dates, and formatted text for natural-sounding speech synthesis.

Performance Optimization

Voice caching reduces latency for common phrases, while regional settings optimize for local performance.

Conversation Tracking

Speaker labeling and diarization support for multi-participant conversation analysis and management.