7 Sources
[1]
Mistral releases Voxtral, its first open source AI audio model | TechCrunch
As AI systems become more capable, speech is fast becoming the default way we communicate with machines. French AI startup Mistral has jumped into the audio race with its first open model, aiming to challenge the dominance of walled-off corporate systems with open-weight alternatives. On Tuesday, Mistral announced the release of Voxtral, its first family of audio models aimed at businesses. The company is pitching Voxtral as the first open model that's capable of deploying "truly usable speech intelligence in production." In other words, no longer will developers have to choose between a cheap, open system that fumbles transcriptions and doesn't really understand what's being said, and one that functions well, but is closed, leaving developers with a higher bill and less control over deployment. For businesses, that means Voxtral offers an affordable alternative that the company claims is "less than half the price" of comparable solutions. Mistral says Voxtral can transcribe up to 30 minutes of audio. Due to its LLM backbone, Mistral Small 3.1, it can understand up to 40 minutes, allowing users to ask questions about the audio content, generate summaries, or turn voice commands into real-time actions like calling APIs or running functions. Voxtral is also multilingual, with the ability to transcribe and understand languages including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian. The company is offering up two variants of its "speech understanding models". The first, Voxtral Small, has 24B parameters for production-scale deployments, and is competitive with ElevenLabs Scribe, GPT-4o-mini, and Gemini 2.5 Flash. The second, Voxtral Mini, has 3 billion parameters for local and edge deployments. There's also an ultra-cheap, stripped-down, fast API version of the 3B model called Voxtral Mini Transcribe that is optimized for transcription-only use cases and promises to outperform OpenAI Whisper for less than half the price. Users can try Voxtral for free by downloading the API on Hugging Face or testing the models in Mistral's chatbot Le Chat. Integrating the API into applications starts at $0.001 per minute, according to the company. The launch comes a month after Mistral announced Magistral, its first family of reasoning models that work through problems step-by-step for improved reliability. Mistral, one of the top AI firms in Europe, is well-known for its advocacy pushing open source AI models. Earlier this month, TechCrunch reported that the company is in talks to raise up to $1 billion in equity from investors like Abu Dhabi's MGX fund.
[2]
Mistral launches Voxtral speech recognition model
Mistral has released an open automatic speech recognition (ASR) software bundle called Voxtral in a bid to undercut rivals on price and quality. The biz claims that using ASR in production has required a trade-off - using open-source models with high error rates and limited semantic understanding or using closed proprietary models for better accuracy at a higher cost. "Voxtral bridges this gap," the Paris-based AI biz claimed in a blog post. "It offers state-of-the-art accuracy and native semantic understanding in the open, at less than half the price of comparable APIs." Comparable APIs include OpenAI's Whisper model, which provides transcription at a price of $0.006 per minute, and its gpt-4o-mini-transcribe model, priced at $0.003 per minute. The Voxtral API starts at $0.001 per minute and goes up to about $0.004 with an allegedly better word error rate than gpt-4o-mini-transcribe. "Voxtral comprehensively outperforms Whisper large-v3, the current leading open-source Speech Transcription model," Mistral claims, alongside various supporting benchmark result graphs. "It beats GPT-4o mini Transcribe and Gemini 2.5 Flash across all tasks, and achieves state-of-the-art results on English short-form and Mozilla Common Voice, surpassing ElevenLabs Scribe and demonstrating its strong multilingual capabilities." Researchers last year found [PDF] that about 1 percent of OpenAI Whisper transcriptions contained hallucinated passages. Mistral has provided no data on hallucination rates that we're aware of. Voxtral supports input (context) of up to 32,000 tokens, which corresponds to about 30 minutes of audio transcription or 40 minutes for understanding. It can respond to questions about the audio or generate summaries. It can automatically detect widely used languages, such as English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, among others. And it incorporates function-calling via voice to trigger code workflows via voice. While Voxtral models can be downloaded and used in applications at no cost, Mistral is hoping businesses will pay to use its ASR technology for their applications. The AI shop is offering to help companies set up Voxtral for production-scale inference in private infrastructure and to help tune models for industry-specific applications. Mistral also says it's looking for potential partners who can provide additional functionality like speaker identification or emotion detection in model deployments. Earlier this month, Mistral joined with dozens of other European companies to urge European lawmakers to pause the EU AI Act because they see the rules limiting the competitive potential of businesses on the continent. ®
[3]
Mistral's Voxtral goes beyond transcription with summarization, speech-triggered functions
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Mistral released an open-sourced voice model today that could rival paid voice AI, such as those from ElevenLabs and Hume AI, which the company said bridges the gap between proprietary speech recognition models and the more open, yet error-prone versions. Voxtral, which Mistral will release under an Apache 2.0 license, is available in a 24B parameter version and a 3B variant. The larger model is intended for applications at scale, while the smaller version would work for local and edge use cases. "Voice was humanity's first interface -- long before writing or typing, it let us share ideas, coordinate work, and build relationships. As digital systems become more capable, voice is returning as our most natural form of human-computer interaction," Mistral said in a blog post. "Yet today's systems remain limited -- unreliable, proprietary, and too brittle for real-world use. Closing this gap demands tools with exceptional transcription, deep understanding, multilingual fluency, and open, flexible deployment." Voxtral is available on Mistral's API and a transcription-only endpoint on its website. The models are also accessible through Le Chat, Mistral's chat platform. Mistral said that speech AI "meant choosing between two trade-offs," pointing out that some open-source automated speech recognition models often had limited semantic understanding. Still, closed models with strong language understanding come at a high cost. Bridging the gap The company said Voxtral "offers state-of-the-art accuracy and native semantic understanding in the open, at less than half the price of comparable APIs." Voxtral, at a 32K token context, can listen to and transcribe up to 30 minutes of audio or 40 minutes of audio understanding. It offers summarization, meaning the model can answer questions based on the audio content and generate summaries without switching to a separate mode. Users can trigger functions and API calls based on spoken instructions. The model is based on Mistral's Mistral Small 3.1. It supports multiple languages and can automatically detect languages such as English, Spanish, French, Portuguese, Hindi, German, Italian, and Dutch. Mistral added enterprise features to Voxtral, including private deployment, so that organizations can integrate the model into their own ecosystems. These features also include domain-specific fine-tuning and advanced context and priority access to engineering resources for customers who need help integrating Voxtral into their workflows. Performance Speech recognition AI is now available on many platforms today. Users can speak to ChatGPT, and the platform will process spoken instructions similarly to written prompts. Fast food chains like White Castle have deployed SoundHound to their drive-thru services, and ElevenLabs has steadily been improving its multimodal platform. The open-source space also offers powerful options. Nari Labs, a startup, released the open-source speech model Dia in April. However, some of these services can be quite expensive. Transcription services like Otter and Read.ai can now embed themselves into Zoom meetings, recording, summarizing and even alerting users to actionable items. Many online video meeting platforms offer not just transcription, but also speech AI and agentic AI, with Google Meetings providing the option to take notes for users using Gemini. As a regular user of voice transcription services, I can say firsthand that speech recognition AI is not perfect, but it is improving. Mistral stated that Voxtral outperformed existing voice models, including OpenAI's Whisper, Gemini 2.5 Flash and Scribe from ElevenLabs. Voxtral presented fewer word errors compared to Whisper, which is currently considered the best automatic speech recognition model available. In terms of audio understanding, Voxtral Small is "competitive with GPT-4o-mini and Gemini 2.5 Flash across all tasks, achieving state-of-the-art performance in Speech Translation." Since announcing Voxtral, social media users said they have been waiting for an open-source speech model that can match the performance of Whisper. Mistral said Voxtral will be available through its API at $0.001 per minute.
[4]
Mistral Unveils Voxtral, Its Open-Source Bet to Rival OpenAI and ElevenLabs | AIM
The French AI startup is taking the open-source route to tackle competitors in the speech understanding AI models space. French AI startup Mistral has released Voxtral, a new family of open-source speech understanding models designed to deliver production-ready transcription and semantic audio analysis at a fraction of the cost of proprietary alternatives, such as OpenAI Whisper and ElevenLabs Scribe. The Voxtral models come in two variants: a 24B version for large-scale deployments and a 3B 'Mini' version for local or edge use. Both are available under the Apache 2.0 licence and can be downloaded via Hugging Face or accessed through Mistral's API. A dedicated low-cost transcription endpoint is also available, priced at $0.001 per minute. Designed to handle long-form audio with up to 32,000 tokens of context, Voxtral supports direct question answering and summarisation without chaining multiple models. It supports multiple languages and lets developers trigger actions directly from spoken prompts. Benchmark results shared by Mistral show Voxtral outperforming Whisper Large V3, GPT-4o Mini Transcribe, and Gemini 2.5 Flash across a range of transcription and multilingual tasks, including FLEURS and Mozilla Common Voice. The company claims state-of-the-art results in English and European languages, along with strong audio understanding and translation performance. Voxtral also retains the text processing capabilities of its Mistral Small 3.1 backbone, enabling seamless transitions between voice and language tasks. For enterprises, Mistral offers options for on-premises deployment, domain-specific fine-tuning, and extended features such as speaker ID, emotion detection, and diarization. The models can be tested via Le Chat's voice mode or integrated via API. A webinar with Inworld AI on August 6 aims to demonstrate end-to-end voice agent applications. Mistral revealed that it is actively hiring to expand its audio team as it pushes toward building "near-human-like voice interfaces". The company has also recently launched Magistral, its first reasoning-focused language model, in two versions -- open-source Small and enterprise-grade Medium. Tuned for multi-step logic across domains like finance and healthcare, it supports multiple languages and delivers high benchmark scores.
[5]
Mistral Voxtral: Open-source AI audio arrives
Mistral introduced Voxtral, its initial open-source AI audio model family, on Tuesday, aiming to provide businesses with a production-ready speech intelligence solution. This release challenges existing corporate systems by offering an open-weight alternative for audio processing. The company positions Voxtral as an open model that facilitates usable speech intelligence in production environments. Voxtral intends to address the dilemma faced by developers, who often choose between inexpensive open systems with transcription inaccuracies and functional but closed systems that entail higher costs and less deployment control. For businesses, Voxtral is presented as an affordable option, with Mistral stating it is "less than half the price" of comparable solutions available in the market. Voxtral can transcribe up to 30 minutes of audio. Leveraging its LLM backbone, Mistral Small 3.1, the model can comprehend up to 40 minutes of audio content. This capability allows users to query audio content, generate summaries, and trigger real-time actions such as API calls or function executions through voice commands. Voxtral also supports multiple languages, including English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian, for both transcription and comprehension. Mistral offers two variants of its speech understanding models. Voxtral Small, designed for production-scale deployments, features 24 billion parameters. It is positioned as competitive with models such as ElevenLabs Scribe, GPT-4o-mini, and Gemini 2.5 Flash. The second variant, Voxtral Mini, contains 3 billion parameters and is optimized for local and edge deployments. Additionally, an optimized API version of the 3-billion-parameter model, named Voxtral Mini Transcribe, focuses on transcription-only use cases. This variant is promoted as outperforming OpenAI Whisper at less than half the cost. Users can access Voxtral for free by downloading its API on Hugging Face or by testing the models within Mistral's chatbot, Le Chat. Integrating the API into applications starts at a rate of $0.001 per minute. This launch follows Mistral's announcement last month of Magistral, its first family of reasoning models designed for improved reliability through step-by-step problem-solving. Mistral, a prominent European AI firm known for advocating open-source AI models, is reportedly in discussions to raise up to $1 billion in equity from investors, including Abu Dhabi's MGX fund, as reported by TechCrunch earlier this month.
[6]
Mistral Releases Its First Open-Source Speech Generation Models
Mistral's new speech generation model can detect multiple languages Mistral released its first speech understanding models on Tuesday. Dubbed Voxtral, it is an open-source audio generation artificial intelligence (AI) model that not only turns text into speech but can also understand text to generate speech as a response natively. These models are available in two sizes of 24 billion parameters and three billion parameters. The Paris-based AI firm highlighted that not only is Voxtral available to download for free, but the company is also making it available at an affordable rate via application programming interface (API). In a newsroom post, Mistral calls voice "humanity's first interface," highlighting it as a foundational pillar of communication. As AI models become more capable, the French AI company said it was important to bring human-computer interactions to this natural interface. However, there are some gaps in this effort. Mistral claimed today's voice-focused AI models can be grouped in two categories: open-source models that have a high word error rate and limited semantic understanding; and closed proprietary models that are very expensive and not accessible to all. Voxtral, an open-source model with native semantic understanding, is aimed at closing this gap, the company added. There are three models in total -- Voxtral Small with 24B parameters, Voxtral Mini with 3B parameters, and Voxtral Mini Transcribe with 3B parameters. All of these models are available to the open community with the Apache 2.0 license that allows both academic and commercial usage. Mistral claims Voxtral offers the best balance between performance and cost efficiency Photo Credit: Mistral Notably, Voxtral Small is the company's premium model aimed at production-scale applications, while the Voxtral Mini is designed for local and edge deployments. The Voxtral Mini Transcribe is focused on transcription-related tasks and is said to outperform OpenAI Whisper. Voxtral models have a context window of 32,000 tokens, which translates to up to 30 minutes of transcription or 40 minutes of voice understanding. It can also answer questions about audio content and generate summaries natively. Additionally, Voxtral is also capable of detecting multiple languages, including English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, and more. These models are built on top of Mistral Small 3.1, Voxtral models also offer function calling via voice, so users can command the AI system without having to type anything. Mistral claims that the Vostral Small model outperforms GPT-4o mini Transcribe and Gemini 2.5 Flash across tasks, and surpasses ElevenLabs Scribe in multilingual capabilities. The Voxtral models can be downloaded from the company's Hugging Face listing, accessed via API at a starting price of $0.001 (roughly Re. 1) per minute, or can be tried out via Mistral's Le Chat platform.
[7]
What is Voxtral: Mistral's open Source AI Audio Model, Key Features Explained
Voxtral is free, fast, Apache 2.0 licensed, and outperforms Whisper and GPT-4o mini in benchmarks. In July 2025, Mistral AI unveiled Voxtral, a powerful new entry in the world of AI audio models. Unlike most competitors, Voxtral is fully open source and designed for deeper audio understanding, interaction, and automation. Positioned as a direct alternative to OpenAI's Whisper and other proprietary tools, Voxtral blends performance with developer freedom. Here's a breakdown of what Voxtral is, what makes it different, and why it matters. Also read: Le Chat: A faster European alternative to American AI Voxtral is an open-source family of AI models built to handle speech recognition, transcription, audio comprehension, and even voice-triggered automation. It's part of Mistral AI's broader push to make top-tier generative AI tools more accessible, transparent, and cost-effective. Released under the Apache 2.0 license, Voxtral can be used freely in both commercial and private deployments. There are two versions of the model: Voxtral isn't just another automatic speech recognition (ASR) tool, it goes beyond transcription. It's capable of understanding audio content, summarizing what was said, answering questions about the conversation, and extracting key data points, all directly from voice input. Also read: Grok 4 is full of controversies: A list of xAI's misconduct For example, if you ask, "What was the refund request mentioned in the call?", Voxtral can give you an exact answer with timestamps, without needing a separate language model. Unlike traditional models that struggle with longer clips, Voxtral can process up to 30-40 minutes of audio per pass, thanks to its 32,000-token context window. This makes it ideal for meetings, interviews, lectures, and podcasts.Voxtral can automatically detect and transcribe speech in multiple languages, including English, Spanish, French, Hindi, German, Portuguese, Dutch, and Italian. There's no need to manually set the language, it just works. One of Voxtral's standout features is its ability to interpret voice commands and trigger backend actions. For instance, if a user says, "Check the order and send a confirmation email," Voxtral can process the request and pass it to your API, eliminating the need for custom grammar or intent layers. Users can interact with audio files like they would with a chatbot. You can ask questions about a recorded call, generate a summary, or extract decisions made in a meeting. all without manual tagging or transcripts. Voxtral is cheaper and faster than many closed models. It outperforms OpenAI Whisper large-v3 in several transcription and comprehension benchmarks, while costing just $0.001-$0.004 per audio minute, depending on deployment. That's nearly half the price of Whisper or other commercial tools. Beyond audio, Voxtral can handle textual tasks like code completion, summarization, and reasoning, making it a versatile tool for developers who need a unified model across modalities. Voxtral's arrival is a major moment for the AI developer community. It offers capabilities typically locked behind proprietary APIs but in a completely open format. Enterprises get privacy and control, startups get flexibility and cost savings, and developers get to build without constraints. Whether you're building a voice assistant, summarizing customer calls, or deploying multilingual transcription on mobile, Voxtral brings powerful AI audio tools into the open, without cutting corners on performance.
Share
Copy Link
French AI startup Mistral releases Voxtral, an open-source speech recognition model family, aiming to provide affordable and accurate audio processing solutions for businesses while competing with established proprietary systems.
French AI startup Mistral has made a significant move in the artificial intelligence landscape with the release of Voxtral, its first family of open-source AI audio models 1. This launch marks Mistral's entry into the competitive speech recognition market, challenging established proprietary systems with an open-weight alternative designed for business applications.
Source: Dataconomy
Voxtral aims to address a critical dilemma faced by developers in the field of speech recognition. Traditionally, they have had to choose between inexpensive open systems with limited accuracy and understanding, or well-functioning but closed systems that come with higher costs and less deployment flexibility 2. Mistral positions Voxtral as a solution that offers "state-of-the-art accuracy and native semantic understanding in the open, at less than half the price of comparable APIs" 3.
Voxtral is available in two main variants:
Additionally, Mistral offers Voxtral Mini Transcribe, a streamlined version of the 3B model specifically designed for transcription tasks 1.
The model can handle up to 32,000 tokens of context, allowing it to process approximately 30 minutes of audio for transcription or 40 minutes for comprehension 4. Voxtral's capabilities extend beyond mere transcription, enabling users to ask questions about audio content, generate summaries, and even trigger real-time actions like API calls or function executions through voice commands 1.
Source: Analytics India Magazine
Voxtral boasts strong multilingual capabilities, supporting languages such as English, Spanish, French, Portuguese, Hindi, German, Dutch, and Italian for both transcription and comprehension 1. Mistral claims that Voxtral outperforms existing models like OpenAI's Whisper, Gemini 2.5 Flash, and ElevenLabs' Scribe across various benchmarks, including FLEURS and Mozilla Common Voice 4.
Mistral has made Voxtral accessible through multiple channels. Users can download the API from Hugging Face or test the models in Mistral's chatbot, Le Chat, free of charge. For those looking to integrate the API into their applications, pricing starts at a competitive rate of $0.001 per minute 5. This pricing strategy positions Voxtral as a cost-effective alternative to existing solutions in the market.
The release of Voxtral represents a significant development in the open-source AI community. By offering a high-performance, open-source alternative to proprietary speech recognition systems, Mistral is challenging the status quo and potentially democratizing access to advanced audio processing capabilities 3.
Source: The Register
Mistral has indicated that it is actively expanding its audio team, with the goal of developing "near-human-like voice interfaces" 4. This launch follows the recent introduction of Magistral, Mistral's reasoning-focused language model, demonstrating the company's commitment to innovation across various AI domains 5.
As Mistral continues to make waves in the AI industry, reports suggest that the company is in talks to raise up to $1 billion in equity from investors, including Abu Dhabi's MGX fund 1. This potential influx of capital could further accelerate Mistral's growth and development in the competitive AI landscape.
Summarized by
Navi
[2]
[3]
[4]
Analytics India Magazine
|Mistral Unveils Voxtral, Its Open-Source Bet to Rival OpenAI and ElevenLabs | AIM[5]
Google rolls out an AI-powered business calling feature in Search and upgrades AI Mode with Gemini 2.5 Pro and Deep Search capabilities, showcasing significant advancements in AI integration for everyday tasks.
11 Sources
Technology
14 hrs ago
11 Sources
Technology
14 hrs ago
Calvin French-Owen, a former OpenAI engineer, shares insights into the company's intense work environment, rapid growth, and secretive culture, highlighting both challenges and achievements in AI development.
4 Sources
Technology
15 hrs ago
4 Sources
Technology
15 hrs ago
Microsoft's AI assistant Copilot lags behind ChatGPT in downloads and user adoption, despite the company's significant investment in AI technology and infrastructure.
4 Sources
Technology
14 hrs ago
4 Sources
Technology
14 hrs ago
Larry Ellison, Oracle's co-founder, surpasses Mark Zuckerberg to become the world's second-richest person with a net worth of $251 billion, driven by Oracle's AI-fueled stock rally and strategic partnerships.
4 Sources
Business and Economy
23 hrs ago
4 Sources
Business and Economy
23 hrs ago
OpenAI has added Google Cloud to its list of cloud partners, joining Microsoft, Oracle, and CoreWeave, as the AI giant seeks to meet escalating demands for computing capacity to power its AI models like ChatGPT.
5 Sources
Technology
7 hrs ago
5 Sources
Technology
7 hrs ago