2 Sources
2 Sources
[1]
DeepL, known for text translation, now wants to translate your voice | TechCrunch
DeepL, a translation company best known for its text tools, released a voice-to-voice translation suite today that covers use cases like meetings, mobile and web conversations, and group conversations for frontline workers through custom apps. The company is also releasing an API that lets outside developers and businesses build on top of DeepL's tech for customized use cases, such as call centers. "After spending so many years in text translation, voice was a natural step for us," DeepL CEO Jarek Kutylowski told TechCrunch in an interview. "We have come a long way when it comes to text translation and document translation. But we thought there wasn't a great product for real-time voice translation." Kutylowski said that the challenges in creating a real-time translation product center on striking a balance between reducing latency -- the delay between someone speaking and the translated audio playing back -- and maintaining accurate results. DeepL is releasing add-ons for platforms like Zoom and Microsoft Teams, where listeners can either hear real-time translation while others are speaking in native languages or follow real-time translated text on screen. This program is currently under early access, and the company is inviting organizations to join a waitlist. The company also has a product for mobile and web-based conversations that can take place in person or remotely. DeepL also lets allows users participate in a group conversation in settings like a setting like training sessions or workshops, allowing participants to join through a QR code. DeepL said that its voice-to-voice tech can also learn and adapt to custom vocabulary, such as industry-specific terms and company and personal names. Kutylowski said that AI is reimagining what customer service will look like in the coming years. He noted that a translation layer helps companies provide support in languages where qualified staff are scarce and expensive to hire. The company said that it controls the entire voice-to-voice stack. However, the current system converts the speech to text, applies translation, then converts that back to speech. DeepL believes that since it has worked on text translation for years, it has an edge in translation quality. Going forward, the company wants to develop an end-to-end voice translation model that skips the text step entirely. DeepL faces competition from several well-funded startups working in adjacent corners of the space. Sanas, which last year raised $65 million from Quadrille Capital and Teleperformance, uses AI to modify a speaker's accent in real time -- a tool aimed primarily at call center agents. Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment companies Amazon Web Services, helping them dub and localize video content at scale. Palabra, backed by Reddit co-founder Alexis Ohanian's firm Seven Seven Six, is building a real-time speech translation engine designed to preserve both the meaning and the speaker's original voice, putting it in more direct competition with what DeepL is now building.
[2]
DeepL launches real-time voice-to-voice translation in 40+ languages
The Cologne-based translation company best known for its text tools has unveiled a full voice product suite covering meetings, conversations, group settings, and an API for enterprise integration. A live demo in Seoul showed one-to-two sentence delays, and DeepL's CPO acknowledged word order differences between languages remain a fundamental challenge. DeepL, the Cologne-based language AI company that built its reputation on high-quality text translation, has launched DeepL Voice-to-Voice: a real-time spoken translation suite designed for live business communication. The product covers four distinct use cases, virtual meetings, mobile and web conversations, group settings for frontline workers, and enterprise applications through an API, and supports more than 40 languages including all 24 official EU languages and additions such as Vietnamese, Thai, Arabic, Norwegian, Hebrew, Bengali, and Tagalog. The suite's four components are at different stages of availability. Voice for Conversations, which enables real-time translation across mobile and web without requiring app installation, is now generally available. Voice for Meetings, which integrates with Microsoft Teams and Zoom so participants can speak in their native language while others hear simultaneous translation in theirs, is opening an early access programme in June. The Voice-to-Voice API, which lets businesses embed DeepL's translation engine into their own customer-facing applications such as call centres, is in ongoing early access. A customisation feature, Spoken Terms, which allows the system to learn industry-specific vocabulary, company names, and personal names, is scheduled to become generally available on 7 May. Jarek Kutylowski, DeepL's founder and CEO, described the launch as reaching "another frontier in translation." "DeepL Voice-to-Voice allows everyone to speak naturally in their own language without the friction or cost of interpreters," he said. DeepL has positioned the product as an enterprise tool rather than a consumer one: the company said its voice technology never uses customer data to train its models, and does not permanently store transcription or translation data after a call ends, a security framing that distinguishes it from consumer AI voice products and is aimed at regulated industries. The current system works through a three-step pipeline: speech is converted to text, the text is translated using DeepL's established translation engine, and the output is then converted back to speech. DeepL's competitive argument rests on the quality of the middle step: the company says its text translation models outperform alternatives, and that advantage propagates through to the voice output. In blind evaluations commissioned by DeepL and conducted independently by Slator, a language industry research firm, 96% of professional linguists preferred DeepL Voice over the native translation solutions in Google Meet, Microsoft Teams, and Zoom, citing superior fluency and contextual accuracy. DeepL Voice scored 96.4 out of 100 for Zoom and 96.3 for Microsoft Teams. However, a live demonstration by Chief Product Officer Gonzalo Gaiolas at the company's DeepL Connect Seoul event, held on 15 April, exposed the system's current limitation: a visible delay of one to two sentences between the speaker finishing and the translation being delivered. Gaiolas acknowledged the lag directly. "Different languages have different word orders and sentence structures, which causes delays in real-time interpretation," he said, according to Seoul Economic Daily. The company plans to reduce latency through continued model development. On the voice quality side, the current system translates using a fixed synthetic voice; DeepL said it plans to release a voice-preservation feature, which maintains the speaker's original voice characteristics in the translated output, by the end of 2026. DeepL is entering a market with multiple well-funded competitors. Sanas, which uses AI to modify speakers' accents in real time for call centre applications, raised $65 million in a round led by Quadrille Capital. Dubai-based Camb.AI focuses on speech synthesis and translation for media dubbing. Palabra, backed by Reddit co-founder Alexis Ohanian's Seven Seven Six, is developing a real-time speech translation engine focused on preserving speaker voice characteristics. Google, Microsoft, and Zoom all offer their own meeting translation features, the platforms DeepL is simultaneously challenging and integrating with. DeepL's strategic bet is that translation quality, its longest-established differentiator, can outweigh the structural advantages incumbents hold in platform distribution.
Share
Share
Copy Link
DeepL has released a voice-to-voice translation suite covering meetings, mobile conversations, and group settings across 40+ languages. The Cologne-based company is integrating with Zoom and Microsoft Teams while offering an enterprise API for call centers. But live demos reveal latency challenges as the company competes against well-funded rivals like Sanas and Palabra.
DeepL, the Cologne-based language AI company renowned for its text translation tools, has launched DeepL Voice-to-Voice, a real-time translation suite designed to handle live business communications across more than 40 languages
1
2
. The product suite addresses four distinct use cases: virtual meetings, mobile and web conversations, group settings for frontline workers, and enterprise applications through an API. Supported languages include all 24 official EU languages plus Vietnamese, Thai, Arabic, Norwegian, Hebrew, Bengali, and Tagalog2
. Jarek Kutylowski, DeepL's founder and CEO, described the launch as reaching "another frontier in translation," emphasizing that the technology allows everyone to speak naturally in their own language without the friction or cost of interpreters2
.
Source: The Next Web
DeepL is releasing add-ons for platforms like Zoom and Microsoft Teams, where listeners can either hear real-time translation while others speak in native languages or follow real-time translated text on screen
1
. Voice for Meetings, which enables participants to speak in their native language while others hear simultaneous translation, is opening an early access programme in June2
. The company is inviting organizations to join a waitlist for this program. Voice for Conversations, which enables AI-powered spoken translation across mobile and web without requiring app installation, is now generally available2
. The speech translation engine also supports group conversations in settings like training sessions or workshops, allowing participants to join through a QR code1
.
Source: TechCrunch
The Voice-to-Voice API, which lets businesses embed DeepL's translation engine into their own customer-facing applications such as call centers, is in ongoing early access
2
. Kutylowski noted that AI is reimagining what customer service will look like in the coming years, explaining that a translation layer helps companies provide support in languages where qualified staff are scarce and expensive to hire1
. A customization feature called Spoken Terms, which allows the system to learn industry-specific vocabulary, company names, and personal names, is scheduled to become generally available on 7 May2
. DeepL has positioned the product as an enterprise tool, emphasizing that its voice technology never uses customer data to train its models and does not permanently store transcription or translation data after a call ends—a data security framing aimed at regulated industries2
.Related Stories
Kutylowski acknowledged that the challenges in creating a real-time translation product center on striking a balance between reducing latency—the delay between someone speaking and the translated audio playing back—and maintaining accurate results
1
. A live demonstration by Chief Product Officer Gonzalo Gaiolas at DeepL Connect Seoul on 15 April exposed the system's current limitation: a visible delay of one to two sentences between the speaker finishing and the translation being delivered2
. Gaiolas acknowledged the lag directly, stating that "different languages have different word orders and sentence structures, which causes delays in real-time interpretation," according to Seoul Economic Daily2
. The current system works through a three-step pipeline: speech is converted to text, the text is translated using DeepL's established translation engine, and the output is then converted back to speech synthesis2
. Going forward, DeepL wants to develop an end-to-end voice translation model that skips the text step entirely1
.DeepL faces competition from several well-funded startups working in adjacent corners of the space. Sanas, which last year raised $65 million from Quadrille Capital and Teleperformance, uses AI to modify a speaker's accent in real time—a tool aimed primarily at call center agents
1
. Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment companies, helping them dub and localize video content at scale1
. Palabra, backed by Reddit co-founder Alexis Ohanian's firm Seven Seven Six, is building a real-time speech translation engine designed to preserve both the meaning and the speaker's original voice, putting it in more direct competition with what DeepL is now building1
. Google, Microsoft, and Zoom all offer their own meeting translation features—the platforms DeepL is simultaneously challenging and integrating with. In blind evaluations commissioned by DeepL and conducted independently by Slator, a language industry research firm, 96% of professional linguists preferred DeepL Voice over the native translation solutions in Google Meet, Microsoft Teams, and Zoom, citing superior fluency and contextual accuracy2
. The current system translates using a fixed synthetic voice, but DeepL plans to release a voice-preservation feature that maintains the speaker's original voice characteristics in the translated output by the end of 2026.Summarized by
Navi
1
Policy and Regulation

2
Policy and Regulation

3
Technology
