Overview
- Voxtral is offered in three variants: Voxtral Small with 24 billion parameters for production-scale deployments, Voxtral Mini with 3 billion parameters for edge and local use, and Mini Transcribe optimized for cost-effective transcription.
- The models can transcribe up to 30 minutes of audio and, thanks to an LLM backbone, provide semantic understanding of 40 minutes for summarization, question answering and speech-triggered function calls.
- Benchmarks from Mistral show Voxtral outperforms OpenAI’s Whisper large-v3, GPT-4o-mini-transcribe, Google’s Gemini 2.5 Flash and ElevenLabs Scribe across multilingual transcription and comprehension tasks.
- API access starts at $0.001 per minute and scales to $0.004, undercutting comparable services by more than half while maintaining lower word error rates.
- All Voxtral models are available under an Apache 2.0 license on Hugging Face and can be tested for free through Mistral’s Le Chat chatbot.