Overview
- The preview models replace Google's TTS systems released in May, establishing the new baseline for Gemini speech synthesis.
- Flash targets low-latency interactions, while Pro prioritizes higher-fidelity output suited to podcasts, audiobooks, and narration.
- Enhanced instruction following delivers richer tone control with stricter adherence to role and style prompts for more authentic performances.
- Context-aware pacing introduces natural timing shifts and more faithful execution of explicit speed directives.
- Dialogue handling now preserves distinct voices across speakers and across 24 supported languages, with early production use reported by Wondercraft and Toonsutra.