Curated by THEOUTPOST
On Thu, 16 Jan, 12:05 AM UTC
4 Sources
[1]
Meta's 'Babel Fish': Mark Zuckerberg announces a universal voice transla
The SEAMLESSM4T system, reported in Nature, aims to revolutionize global communication by imitating tone and voice, bridging linguistic barriers. Meta, the parent company of Facebook, Instagram, and WhatsApp, led by Mark Zuckerberg, has developed an artificial intelligence model called SEAMLESSM4T, which incorporates several innovations and surpasses existing models. This system is capable of performing translations in multiple languages, both from and to text and from and to audio, as well as all their combinations. The SEAMLESSM4T model, developed by Meta's AI division, FAIR, is an evolution of its previous model presented in August 2023 and aims to achieve a "Babel Fish" by helping to translate speech between any two languages, reported El Faro de Vigo. This advancement brings the concept of instantaneous universal translation closer to reality. SEAMLESSM4T facilitates voice-to-voice translation, recognizing 101 languages and translating into 36 languages; voice-to-text translation from 101 to 96 languages; text-to-voice translation from 96 to 36 languages; text-to-text translation among 96 languages; and automatic speech recognition for 96 languages. This capability accelerates the translation process by performing translations without intermediate steps. SEAMLESSM4T achieves between 8% and 23% better results than state-of-the-art translation systems, with a precision that is 8% to 23% higher, according to the Bilingual Evaluation Understudy standard, as reported by Tech Xplore. Furthermore, SEAMLESSM4T is 50% more resistant to background noise and speaker variations in voice-to-text conversion tasks than previous state-of-the-art systems, with improved background noise filtering by 42% to 66%, according to 20 Minutos. This robustness enhances its performance in real-world scenarios where such challenges are common. According to the journal Nature, the SEAMLESSM4T system promises to revolutionize global communications by imitating the tone and voice of the interlocutors and represents a step forward in improving communication beyond linguistic barriers, as reported by Tech Xplore. Readers of science fiction might be familiar with the Babel Fish from Douglas Adams' The Hitchhiker's Guide to the Galaxy, a small fish that could be inserted into an ear and simultaneously translate from one spoken language to another, as noted by HuffPost Spain. Meta has made resources related to SEAMLESSM4T publicly available for non-commercial use to assist further research on inclusive speech translation technologies, according to El Periódico. "All contributions to this work are publicly available for non-commercial use in order to promote further research on inclusive speech translation technologies," the company stated, as reported by 20 Minutos. In an article published in Nature, Tanel Alumäe from the Language Technology Laboratory at Tallinn University of Technology (TalTech) in Estonia highlights that the model is capable of translating directly into 36 languages. Alumäe describes this capability as "impressive because it can -- for example -- translate spoken English to spoken German without having to transcribe it first into English to translate it afterward." Alumäe points out that although the SEAMLESSM4T model translates around a hundred languages, the number of spoken languages in the world is about 7,000. He notes that the tool still has difficulties in many situations that humans handle with relative ease, such as conversations in noisy places or between people with strong accents, according to Diario de Sevilla. He predicts that "the authors' methods to leverage real-world data will open a promising path towards speech technology that rivals science fiction," as reported by El Periódico. Allison Koenecke from the Department of Computer Science at Cornell University in New York warns that although speech technologies may be more efficient and cost-effective than humans, "it is imperative to understand the ways in which these technologies fail disproportionately for some demographic groups," especially in sensitive contexts like medicine or the legal field. Koenecke emphasizes that it is essential for future researchers in speech technologies to improve performance disparities, as noted by El Periódico. She states that users should be well-informed about the possible benefits and harms associated with these models. Stay updated with the latest news! Subscribe to The Jerusalem Post Newsletter Subscribe Now SEAMLESSM4T includes languages with limited data available for training AI models, improving the shortcomings of other models regarding "languages with fewer speakers or less available digital data," as reported by PSN Noticias. Meta has prioritized the elimination of toxic results that may incite hate, violence, or abuse in translations, and to achieve this, it has implemented a specific tool called Etox, according to PSN Noticias. The article was written with the assistance of a news analysis system.
[2]
Meta's New Translation AI Is Nearly a Babel Fish
Universal translators in science fiction, such as the Babel fish in The Hitchhiker's Guide to the Galaxy, have long offered the dream of instantaneous translation from one spoken language to another. Now, in what may be a key step toward making this fantasy a reality, scientists at Facebook's parent company Meta have developed an AI system that can instantly translate speech and text, including direct speech-to-speech translations, for up to 101 languages. "Science fiction provides a clear goal that our group can focus on," says Marta Costa-jussà , a research scientist at Meta's Fundamental AI Research team in Menlo Park, California. The scientists described their work on 15 January in the journal Nature. As the world grows more interconnected, people have more access to multilingual content than ever. However, most automated translation systems are designed to only input and output text. Until now, the speech-to-speech machine translation systems that did exist covered significantly fewer languages than text-to-text systems. Moreover, previous speech-to-speech systems were often skewed toward translating a given language into English, rather than English to another language. Now Meta has developed an AI system called SeamlessM4T that can translate speech and text in up to 101 languages. Specifically, it can support speech-to-speech translation for 101 to 36 languages, speech-to-text translation for 101 to 96 languages, text-to-speech translation for 96 to 36 languages, text-to-text translation for 96 languages, and automatic speech recognition for 96 languages. (Whether it can or cannot translate between languages depends on the availability of quality speech data, Costa-jussà says.) To develop SeamlessM4T, the researchers trained a brain-mimicking neural network AI system on 4 million hours of multilingual audio and tens of billions of sentences from publicly available repositories of web data. They also had it analyze roughly 443,000 hours of audio with matching text -- for instance, Internet video clips with subtitles -- to further improve the system. When it came to speech-to-speech translation, the research team found SeamlessM4T's translations were up to 23 percent more accurate than previous state-of-the-art systems. With speech-to-text tasks, it was 8 percent more accurate than prior systems. Furthermore, SeamlessM4T was roughly 50 percent more resilient against background noise and variations in how speakers talked when it came to speech-to-text tasks. Moreover, it could translate utterances mixing two or more languages. To reduce the chances that SeamlessM4T might add profanity and other toxic language to its translations, the researchers employed two strategies to remove toxicity during its training and operation. When they compared SeamlessM4T models to the state of the art, they found these approaches reduced toxicity in translations by up to 20 percent. "The SeamlessM4T work takes care to audit potential harms in its translations," says Allison Koenecke, an assistant professor of information technology at Cornell University in Ithaca, N.Y., who did not participate in this research. "This is especially important as machine-based speech translation is increasingly being used in a range of high-stakes applications, from medical appointments to workplace hiring." The Meta scientists also examined whether SeamlessM4T unfairly favored one gender when translating gender-neutral phrases into gendered languages. However, they found they could not significantly improve gender-bias performance, and say they need to develop specific techniques to counteract this bias. Along with SeamlessM4T, Meta released several supporting AI systems for analyzing speech and text, notes Tanel Alumäe, an associate professor of speech processing at Tallinn University of Technology in Estonia, who did not work on SeamlessM4T. Alumäe and his colleagues have successfully used one of these tools "for stuff like emotion recognition from speech and detecting early cognitive decline -- for example, Alzheimer's -- from speech," he notes. Currently Meta is using SeamlessM4T to help automatically dub videos on Instagram and Facebook. It also helps enable real-time translation of Spanish, French, or Italian to English through speakers on special Ray-Ban glasses, Costa-jussà says. To spur future research into speech translation technologies, Meta is making code, tools, libraries, and other resources associated with SeamlessM4T publicly available for noncommercial use. Much work remains if researchers want to make SeamlessM4T a universal translator on Earth. Although it's exciting that the current technology supports about 100 languages, "the number of languages spoken in the world is much larger -- around 6,500, by some estimates," Alumäe says.
[3]
Meta takes us a step closer to Star Trek's universal translator
Back in 2023, AI researchers at Meta interviewed 34 native Spanish and Mandarin speakers who lived in the US but didn't speak English. The goal was to find out what people who constantly rely on translation in their day-to-day activities expect from an AI translation tool. What those participants wanted was basically a Start Trek universal translator or the Babel Fish from the Hitchhiker's Guide to the Galaxy: an AI that could not only translate speech to speech in real time across multiple languages, but also preserve their voice, tone, mannerisms, and emotions. So, Meta assembled a team of over 50 people and got busy building it. What this team came up with was a next-gen translation system called Seamless. The first building block of this system is described in Wednesday's issue of Nature; it can translate speech among 36 different languages. AI translation systems today are mostly focused on text, because huge amounts of text are available in a wide range of languages thanks to digitization and the Internet. Institutions like the United Nations or European Parliament routinely translate all their proceedings into the languages of all their member states, which means there are enormous databases comprising aligned documents prepared by professional human translators. You just needed to feed those huge, aligned text corpora into neural nets (or hidden Markov models before neural nets became all the rage) and you ended up with a reasonably good machine translation system. But there were two problems with that. The first issue was those databases comprised formal documents, which made the AI translators default to the same boring legalese in the target language even if you tried to translate comedy. The second problem was speech -- none of this included audio data. The problem of language formality was mostly solved by including less formal sources like books, Wikipedia, and similar material in AI training databases. The scarcity of aligned audio data, however, remained. Both issues were at least theoretically manageable in high-resource languages like English or Spanish, but they got dramatically worse in low-resource languages like Icelandic or Zulu.
[4]
Meta's new AI model can translate speech from more than 100 languages
The key is a process called parallel data mining, which finds instances when the sound in a video or audio matches a subtitle in another language from crawled web data. The model learned to associate those sounds in one language with the matching pieces of text in another. This opened up a whole new trove of examples of translations for their model. "Meta has done a great job having a breadth of different things they support, like text-to-speech, speech-to-text, even automatic speech recognition," says Chetan Jaiswal, a professor of computer science at Quinnipiac University, who was not involved in the research. "The mere number of languages they are supporting is a tremendous achievement." Human translators are still a vital part of the translation process, the researchers say in the paper, because they can grapple with diverse cultural contexts and make sure the same meaning is conveyed from one language into another. This step is important, says Lynne Bowker of the University of Ottawa's School of Translation & Interpretation, who didn't work on Seamless. "Languages are a reflection of cultures, and cultures have their own ways of knowing things," she says. When it comes to applications like medicine or law, machine translations need to be thoroughly checked by a human, she says. If not, misunderstandings can result. For example, when Google Translate was used to translate public health information about the covid-19 vaccine from the Virginia Department of Health in January 2021, it translated "not mandatory" in English into "not necessary" in Spanish, changing the whole meaning of the message. AI models have much more examples to train on in some languages than others. This means current speech-to-speech models may be able to translate a language like Greek into English, where there may be many examples, but cannot translate from Swahili to Greek. The team behind Seamless aimed to solve this problem by pre-training the model on millions of hours of spoken audio in different languages. This pre-training allowed it to recognize general patterns in language, making it easier to process less widely spoken languages because it already had some baseline for what spoken language is supposed to sound like. The system is open-source, which the researchers hope will encourage others to build upon its current capabilities. But some are skeptical of how useful it may be compared with available alternatives. "Google's translation model is not as open-source as Seamless, but it's way more responsive and fast, and it doesn't cost anything as an academic," says Jaiswal. The most exciting thing about Meta's system is that it points to the possibility of instant interpretation across languages in the not-too-distant future -- like the Babel fish in Douglas Adams' cult novel The Hitchhiker's Guide to the Galaxy. SeamlessM4T is faster than existing models but still not instant. That said, Meta claims to have a newer version of Seamless that's as fast as human interpreters. "While having this kind of delayed translation is okay and useful, I think simultaneous translation will be even more useful," says Kenny Zhu, director of the Arlington Computational Linguistics Lab at the University of Texas at Arlington, who is not affiliated with the new research.
Share
Share
Copy Link
Meta unveils SEAMLESSM4T, an advanced AI model capable of translating speech and text across multiple languages, bringing us closer to the concept of a universal translator.
Meta, the parent company of Facebook, Instagram, and WhatsApp, has unveiled SEAMLESSM4T, an artificial intelligence model that represents a significant advancement in language translation technology. Developed by Meta's AI division, FAIR, this system aims to revolutionize global communication by bridging linguistic barriers 1.
SEAMLESSM4T boasts impressive capabilities:
The system demonstrates superior performance compared to existing models:
To create SEAMLESSM4T, researchers trained a neural network on:
The team employed a process called parallel data mining, which associates sounds in one language with matching text in another, significantly expanding the training dataset 4.
Meta is already implementing SEAMLESSM4T in practical applications:
Despite its advancements, SEAMLESSM4T faces several challenges:
Meta has made SEAMLESSM4T's resources publicly available for non-commercial use, encouraging further research in inclusive speech translation technologies 1. This open-source approach may lead to advancements in:
As the technology progresses, it brings us closer to the concept of a universal translator, reminiscent of science fiction devices like the Babel Fish from "The Hitchhiker's Guide to the Galaxy" 3.
Reference
[1]
[2]
IEEE Spectrum: Technology, Engineering, and Science News
|Meta's New Translation AI Is Nearly a Babel Fish[3]
[4]
Meta, led by Mark Zuckerberg, introduces a groundbreaking AI translation tool for Instagram and Facebook Reels. This technology promises to revolutionize content creation and consumption across language barriers.
3 Sources
Meta has introduced a voice mode for its AI assistant, allowing users to engage in conversations and share photos. This update, along with other AI advancements, marks a significant step in Meta's AI strategy across its platforms.
10 Sources
Microsoft announces a new AI feature for Teams that will provide real-time language interpretation, including voice simulation, to break down communication barriers in multilingual meetings.
12 Sources
Timekettle launches Babel OS, an advanced AI-driven operating system for simultaneous interpretation, enhancing its translation devices with faster, more accurate, and human-like translations.
5 Sources
Meta has launched Spirit LM, an open-source multimodal language model that seamlessly integrates speech and text, offering more expressive and natural-sounding AI-generated speech. This development challenges existing AI voice systems and competes with models from OpenAI and others.
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved