Mistral launches Voxtral, its first open-source AI audio model

Ken Metral

15 Jul 2025 — 2 min read

Credit: Mistral AI

As artificial intelligence becomes increasingly integrated into our daily lives, speech is rapidly emerging as the preferred interface between humans and machines. Recognizing this shift, French AI startup Mistral has entered the audio AI arena with Voxtral, its first family of speech understanding models, setting its sights on disrupting the dominance of closed, corporate-controlled systems.

Challenging the Status Quo with Open AI

Unveiled on Tuesday, Voxtral is designed for business users and marks Mistral’s bold move into the audio domain. The company claims Voxtral is the first open-weight speech model ready for production-level deployment, addressing a long-standing dilemma in the industry: the tradeoff between open but underperforming models, and powerful yet proprietary tools that come at a premium and limit developer control.

With Voxtral, Mistral is offering a cost-effective alternative, claiming it’s less than half the price of equivalent solutions—without sacrificing performance.

Two Models, Broad Use Cases

Voxtral is being released in two primary variants:

Voxtral Small: With 24 billion parameters, this version is tailored for enterprise-scale deployments. It competes with high-performing models like ElevenLabs Scribe, GPT-4o-mini, and Gemini 2.5 Flash.
Voxtral Mini: A 3 billion-parameter model optimized for edge and local deployments. For developers needing fast, inexpensive transcription capabilities, Mistral also offers Voxtral Mini Transcribe—a stripped-down API version focused solely on transcription. It promises better performance than OpenAI’s Whisper at less than half the price.

Features: More Than Just Transcription

Voxtral isn’t just about converting speech to text. Powered by Mistral's LLM, Mistral Small 3.1, the models are capable of:

Transcribing up to 30 minutes of audio
Understanding and analyzing up to 40 minutes of content
Enabling real-time interaction, such as generating summaries, answering questions, or executing voice commands
Operating in multiple languages, including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian

These capabilities make Voxtral ideal not just for traditional transcription, but also for powering interactive voice assistants, meeting analysis tools, and multilingual customer support systems.

Accessible, Affordable, and Open

Voxtral is being distributed with an open-weight license, underscoring Mistral’s ongoing commitment to open-source AI. Developers can test the models via Mistral’s chatbot Le Chat or download the API through Hugging Face. Pricing for integration starts at an incredibly low $0.001 per minute.

This launch follows the release of Magistral, Mistral’s first family of step-by-step reasoning models, and comes amid reports that the startup is seeking up to $1 billion in funding, further signaling its ambition to lead in the global AI race.

A New Era for Speech AI

With Voxtral, Mistral is betting big on the future of speech-based interaction. By lowering costs, opening access, and matching the capabilities of corporate systems, the company is positioning itself as a key player in the next wave of conversational AI—one where businesses are no longer forced to choose between quality and freedom.

For developers and enterprises alike, this could mark a turning point in how speech technology is deployed at scale.