3 Sources
3 Sources
[1]
Paris-based AI voice startup Gradium nabs $70M seed | TechCrunch
Gradium, a startup spun out of French AI lab Kyutai (backed by French telecom billionaire Xavier Niel), launched out of stealth on Tuesday with a $70 million seed round from a who's who of investors. The round was led by FirstMark Capital and Eurazeo, with participation from Niel, DST Global Partners, billionaire Eric Schmidt and other investors. Gradium has developed audio language AI models designed to deliver voice at scale with ultra-low latency -- essentially, AI voices that respond almost instantly. It was founded just a few months ago, in September, 2025, by Kyutai founding member Neil Zeghidour, who cut his teeth working with voice models as a researcher at Google DeepMind. The startup's goal, it says, is to make voice models speedier and more accurate for developers. And, as a European startup, it launched with multilingual support out of the gate: English, French, German, Spanish, and Portuguese, with additional languages coming. Of course, Gradium is entering a race with plenty of competition. For starters, the frontier LLM companies like OpenAI, Anthropic, Meta Llma, and Mistral all have voice, speech recognition, and multimodal models. Then there are well-funded startups like ElevenLabs, and hundreds of voice/speech models on Hugging Face. Right now, there's no shortage of options for a developer needing AI voice capabilities. That said, the need for what Gradium hopes to offer -- ultra-realistic voice expression and accuracy -- will only grow over time, as AI moves from typed chats to AI agents and expands into use cases from entertainment to work.
[2]
Audio language model startup Gradium raises $70M to create more realistic voice AI systems
Audio language model startup Gradium raises $70M to create more realistic voice AI systems Audio artificial intelligence startup Gradium is launching today after closing on an impressive $70 million seed funding round, just three months after it was founded. The startup is backed by investors that include FirstMark Capital and Eurazeo, which led the funding round, as well as DST Global Partners, Korelya Capital and Amplify Partners, and high-profile angels such as former Google LLC Chief Executive Eric Schmidt. Gradium's mission is to commercialize audio language models, which are specialized AI systems that are designed to process, understand and generate natural language using audio-text data. Natural language is leveraged as a "supervision signal," allowing ALMs to perform tasks such as audio classification and speech synthesis more effectively than general-purpose large language models. The startup says ALMs are the "audio-native counterpart" to LLMs and are meant to support more natural and expressive voice interactions with dramatically lower latency, making conversations with AI feel more realistic. The concept was first developed by Gradium's founders during their time at Kyutai, a nonprofit AI research lab. ALMs are trained on datasets that pair audio with descriptive text, enabling them to learn the complex relationships between sound and language. The natural language supervision technique replaces traditional labeling, using natural speech as a guiding signal to teach them how to understand and say specific words. Co-founder and Chief Executive Neil Zeghidour explained that his company wants to help ALMs lock the true potential of "voice AI," which is still reliant on what he says are subpar systems. "Existing systems are brittle, costly and unable to deliver truly natural interactions," he said. "Our goal is to make voice the primary interface between humans and machines." According to Zeghidour, ALMs can outperform LLMs in any kind of voice AI task, including areas such as speech recognition, where spoken language is transformed into written text, as well as audio generation, such as creating original speech, and audio classification, which refers to identifying and categorizing different audio signals. Ultimately, Gradium wants to transform the capabilities of AI assistants and agents, making conversational interactions between them and humans feel more natural and realistic. "To achieve this, we're eliminating the longstanding tradeoff between quality and scalability: combining ultra-realistic expressivity, accurate transcription and ultra-low-latency interactions at a price point that finally makes high-quality voice ubiquitous," Zeghidour said. Zeghidour has assembled a talented team to make good on this promise, comprised of researchers and engineers who previously worked at Google's DeepMind, Meta Platforms Inc.'s FAIR research team and Jane Street Capital LLC. He said the company possesses one of the industry's highest concentrations of generative audio expertise assembled so far, and it has already developed a number of production-ready systems that are being used by early adopters and generating revenue. Those early adopters include companies in gaming, customer care, language learning, healthcare and AI agents. Gradium is launching its platform and enabling access to its first models today, with support for English, French, German, Spanish and Portuguese. It says it offers flexible plans catering to the smallest developer teams all the way up to the largest enterprises. The startup said it will continue its ongoing collaboration with Kyutai, ensuring it has access to the latest frontier research in generative audio so it can remain at the forefront of the latest innovations in ALMs.
[3]
Gradium Gets $70 Million to Turn Voice Into AI's Universal Interface | PYMNTS.com
But Gradium, a new foundational voice-AI startup co-founded by former DeepMind and Meta researchers, has never operated by conventional logic. In fact, its founders have spent the past several years building algorithms that now underpin much of voice technology. "There are a lot of businesses around voice AI now, but developing very strong models for transcription, synthesis, the technological layer of AI, is very difficult," Neil Zeghidour, founder at Gradium, said during a discussion hosted by PYMNTS CEO Karen Webster. "Only a few people in the world know how to do it properly. In our case, we have invented most of the technological steps and algorithms that are powering current technology." And now, they are entering the market as a challenger. "We have to create our space in this very fast," Zeghidour said, to "make everyone realize how serious we are about being a challenger." Still, the company faces a tall mountain to climb when it comes to commercialization. Consumer sentiment about voice assistants has long been ambivalent. Voice is the most intuitive interface of all, yet it all too often feels frustratingly brittle. "Voice [assistants] have been around for a long time, and I think we're all frustrated with voice because it's impossible to have a conversation. ... It's very keyword-driven," Webster said. "One of our main theses is that the potential of voice AI is mostly unrealized today. And one reason is because the interaction is too brutal," Zeghidour agreed. He described a familiar litany of shortcomings: systems that interrupt users mid-sentence, models that misjudge when someone has finished speaking, synthetic voices that respond with wildly inappropriate emotional tones. Even tasks as simple as making an appointment break down under the weight of latency and inaccuracy. But with major funding and a polished commercial architecture, Gradium believes that solving these problems is not an incremental improvement in user experience but a reengineering of the science behind voice AI models that focuses on four pillars: accuracy, latency, conversational flow and expressive synthesis. While Gradium's $70 million seed round is impressive, it still pales in comparison to the billions that tech giants like Amazon, Apple and others have poured into their own, often underwhelming, voice AI systems. Zeghidour didn't speculate about the internal models of those companies, but he highlighted the stagnation. "Look at something like transcription -- it's been around for 30 years and it's still not there." This is not an indictment of manpower; it's an indictment of architecture, says Zeghidour. Voice assistants are built not only on voice models but on the intelligence of the underlying large language models. "We are at a point where we can accelerate the progress significantly," Zeghidour said, stressing that Gradium's own fundamental breakthrough is algorithmic: a more efficient and powerful audio-language modeling approach, solving for the "voice" layer, which he emphasized the company "invented and is the best at." "We got our first revenue in six weeks," Zeghidour added. "The models were still training and already judged by beta testers as superior to competitors." One of the most important questions facing the industry is how voice and visual context will converge. Webster framed it: People may speak their instructions, but they often need to see something to verify or approve it. To meet this end-user need and deliver a scalable voice experience, Gradium is taking a pragmatic approach: a cascaded system, in which real-time transcription and synthesis wrap around any text-based or visual-language model. "You can take a VLM [voice language model] that understands images," Zeghidour said, "and we just add our real-time transcription and real-time synthesis. Now you can have a conversation about images." This approach uses the text model as the "central processing unit," with voice as the input and output layers. It is not the most philosophically elegant solution; and Gradium has already demonstrated systems that bypass text entirely, doing speech-to-speech or vision-to-speech modeling. But for customers building commercial systems today, the cascaded method can offer maximal compatibility and speed. The goal, Zeghidour emphasized, is simple: "A voice layer that turns any text or vision model into a commercial AI." Gradium's go-to-market strategy is unapologetically B2B, targeting customers that include companies building customer-support agents, medical-appointment systems, coaching platforms, e-learning tools, and any enterprise workflow that depends on conversation. "We sell API access for transcription and synthesis to people building voice agents," Zeghidour said. Just as importantly, Gradium believes it can break a long-standing market dichotomy: the choice between high-quality but expensive voice systems and affordable but low-quality ones. The company intends to collapse that tradeoff entirely: quality on par with best-in-class systems, priced like commodity infrastructure. Webster framed the stakes succinctly. "Voice is the ubiquitous interface ... everyone can use it." And because it's the most human interface, expectations are higher. "People will use their voice to express themselves, to buy, to ask for help ... and it's profoundly important to get it right." The company sees 2026 as the horizon for tackling the deeper technological limitations plaguing voice AI. But the message of its launch is unmistakable: Voice may be about to become the interface layer for the entire AI economy, and Gradium wants to be the infrastructure powering it.
Share
Share
Copy Link
Paris-based AI voice startup Gradium launches with $70 million seed funding to develop audio language models that deliver ultra-realistic voice interactions with dramatically reduced latency. The company aims to make voice the primary interface between humans and machines.
Gradium, a Paris-based AI voice startup, has emerged from stealth mode with one of the most substantial seed funding rounds in recent memory. The company secured $70 million in seed funding led by FirstMark Capital and Eurazeo, with participation from prominent investors including French telecom billionaire Xavier Niel, DST Global Partners, and former Google CEO Eric Schmidt
1
. Remarkably, this funding was raised just three months after the company's founding in September 2025.
Source: PYMNTS
The startup has developed what it calls Audio Language Models (ALMs), specialized AI systems designed to process, understand, and generate natural language using audio-text data. According to CEO Neil Zeghidour, ALMs represent the "audio-native counterpart" to large language models and are engineered to support more natural and expressive voice interactions with dramatically lower latency
2
. The technology was initially developed during the founders' time at Kyutai, a nonprofit AI research lab backed by Xavier Niel.
Source: TechCrunch
Unlike traditional voice AI systems that rely on cascaded architectures, Gradium's ALMs are trained on datasets that pair audio with descriptive text, enabling them to learn complex relationships between sound and language. This approach uses natural language as a "supervision signal," allowing the models to perform tasks such as audio classification and speech synthesis more effectively than general-purpose language models
2
.Zeghidour has identified significant shortcomings in existing voice AI systems, describing them as "brittle, costly and unable to deliver truly natural interactions"
2
. Current voice assistants suffer from issues including interrupting users mid-sentence, misjudging when someone has finished speaking, and responding with inappropriate emotional tones3
.The company's solution focuses on four key pillars: accuracy, latency, conversational flow, and expressive synthesis. Gradium's approach aims to eliminate the traditional tradeoff between quality and scalability by combining ultra-realistic expressivity, accurate transcription, and ultra-low-latency interactions at an accessible price point
2
.Related Stories
Gradium enters a highly competitive market dominated by frontier LLM companies like OpenAI, Anthropic, Meta, and Mistral, all of which offer voice and multimodal capabilities. The startup also faces competition from well-funded voice AI companies like ElevenLabs and hundreds of voice models available on platforms like Hugging Face
1
.Despite this competition, Gradium has demonstrated rapid commercial traction, generating revenue within six weeks of launch while their models were still in training
3
. The company's go-to-market strategy is unapologetically B2B, targeting customers building voice agents for customer support, medical appointments, coaching platforms, e-learning tools, and enterprise workflows.The startup boasts what it claims is one of the industry's highest concentrations of generative audio expertise, with a team comprised of researchers and engineers from Google DeepMind, Meta's FAIR research team, and Jane Street Capital
2
. Zeghidour himself previously worked with voice models as a researcher at Google DeepMind before becoming a founding member of Kyutai.Gradium has launched its platform with multilingual support for English, French, German, Spanish, and Portuguese, with additional languages planned. The company offers flexible pricing plans designed to serve everyone from small developer teams to large enterprises, and maintains ongoing collaboration with Kyutai to ensure access to cutting-edge generative audio research
2
.Summarized by
Navi
[2]
31 Jan 2025β’Business and Economy

15 Oct 2024β’Technology

10 Dec 2024β’Technology

1
Science and Research

2
Business and Economy

3
Technology
