Enhanced Transcription Features & Speech Processing
-
Gladia Transcription Enhancements: Improve transcription accuracy and performance with new
GladiaTranscriber
features:region
: Choose betweenus-west
andeu-west
for optimal latency and data residency compliancereceivePartialTranscripts
: Enable low-latency streaming transcription for real-time conversation flow- Enhanced language detection with support for both single and multiple language modes
-
Advanced Deepgram Controls: Fine-tune speech recognition with enhanced
DeepgramTranscriber
settings:eotThreshold
: End-of-turn detection threshold for precise conversation boundaries (e.g., 0.7)eotTimeoutMs
: Maximum wait time for end-of-turn detection in milliseconds (e.g., 5000ms)eagerEotThreshold
: Early end-of-turn detection for responsive conversations (e.g., 0.3)
-
AssemblyAI Keyterms Enhancement: Boost recognition accuracy for critical terms with
AssemblyAITranscriber.keytermsPrompt
:- Support for up to 100 keyterms, each up to 50 characters
- Improved recognition for specific words and phrases
- Additional cost: $0.04/hour when enabled
-
Speechmatics Custom Vocabulary: Enhance recognition accuracy with
SpeechmaticsCustomVocabularyItem
:content
: The word or phrase to add (e.g., “Speechmatics”)soundsLike
: Alternative phonetic representations (e.g., [“speech mattix”]) for better pronunciation handling
-
Word-Level Confidence: Access detailed transcription confidence data with
CustomLLMModel.wordLevelConfidenceEnabled
, providing word-by-word accuracy metrics for quality assessment and debugging. -
Enhanced Message Metadata: Store transcription confidence and other metadata in
UserMessage.metadata
, enabling detailed analysis of transcription quality and user speech patterns.
AssemblyAITranscriber.wordFinalizationMaxWaitTime
is now deprecated. Use the new smart endpointing plans for better speech timing control. The deprecated property will be removed in a future release.
Transcription Improvements
Choose optimal transcription regions with Gladia’s us-west
and eu-west
options for reduced latency and compliance.
Enable partial transcripts for immediate response processing, reducing perceived latency in conversations.
Fine-tune end-of-turn detection with configurable thresholds and timeouts for natural conversation flow.
Improve accuracy for domain-specific terms, company names, and technical jargon with enhanced vocabulary support.