Get the (almost) daily changelog

Voice Enhancements & Minimax Improvements

  1. Minimax Voice Language Support: Enhance multilingual conversations with MinimaxVoice.languageBoost. Support for 40+ languages including:

    • Chinese and Chinese,Yue for Mandarin and Cantonese
    • English, Spanish, French, German, Japanese, Korean
    • Regional variants and specialized languages like Arabic, Hindi, Thai
    • auto mode for automatic language detection
  2. Text Normalization: Improve number reading and formatting with MinimaxVoice.textNormalizationEnabled. When enabled, spoken numbers, dates, and formatted text are properly pronounced for natural-sounding conversations.

  3. Enhanced Voice Caching: Voice responses are now cached by default with MinimaxVoice.cachingEnabled set to true, reducing latency for repeated phrases and improving overall conversation performance.

  4. Fallback Voice Configuration: Ensure conversation continuity with FallbackMinimaxVoice featuring the same language boost and text normalization capabilities as the primary voice configuration.

  5. Speaker Labeling: Track multiple speakers in conversations with BotMessage.speakerLabel, providing stable speaker identification (e.g., “Speaker 1”) for better conversation analysis and diarization.

  6. Voice Region Support: Choose optimal performance regions with Minimax’s worldwide (default) or china regional settings for better latency and compliance with local regulations.

Language boost settings help the text-to-speech model better understand context and pronunciation for specific languages, resulting in more natural and accurate voice synthesis.

Voice Quality Features

Multilingual Support

Support for 40+ languages with automatic detection and language-specific optimizations for natural pronunciation.

Smart Text Processing

Intelligent normalization of numbers, dates, and formatted text for natural-sounding speech synthesis.

Performance Optimization

Voice caching reduces latency for common phrases, while regional settings optimize for local performance.

Conversation Tracking

Speaker labeling and diarization support for multi-participant conversation analysis and management.