Curated by THEOUTPOST
On Tue, 13 May, 4:02 PM UTC
4 Sources
[1]
Can you hear that? Sonos just made it easier to actually listen to what people say in movies
The rewind button on my TV remote is worn to the point of exhaustion. It feels like every time I want to watch a movie at home, the soundtrack is so loud it could wake up the neighbors a block away, but the dialogue is like a whisper. I pause, rewind 10 seconds, double the volume and scrunch my face in concentration as I try to work out what the characters actually said to move the plot along. It's a frustrating and time wasting dance, but I always wonder how much more difficult it must be for people with hearing loss. Though, I think I might have found the one AI feature that is actually useful in real life. Sonos (yes, that Sonos, the company that decided to annoy every single one of its customers with a badly thought out app update last year), just announced a new AI-powered Speech Enhancement feature. According to a Sonos Newsroom post, the company has managed to use machine learning to separate dialogue from other audio in real time. This means no more temporary volume adjustments as you try to navigate blasting soundtrack and mumbling dialogue. It'll roll out as a free update to the Sonos Arc Ultra soundbar from May 13, 2025. Whether it comes to other Sonos devices in the future remains to be seen, but let's hope this software update is less dramatically catastrophic than the app debacle from last year. The feature, developed in collaboration with the Royal National Institute for Deaf People (RNID), will be available as a tiered setting in the Sonos app, allowing you to toggle between low, medium, high, and max enhancement levels. Low through high are designed for people like me who just can't pick out the dialogue in otherwise noisy films, but the max setting specifically caters to people with hearing loss. In this mode, the software enhances the speech but also adjusts 'non-speech' elements to elevate the dialogue. I've owned Sonos speakers for about a decade, but I lost a lot of trust in the company after the terrible mess they made of the app last year. Even all this time later, I still have to wait up to 30 seconds just to change a track. But it's rare that tech companies actually prioritize accessibility. Making sure that everyone can access and enjoy the things they like should be a core part of any product. And if Sonos spends time on this, and can encourage other brands making some of the best soundbars, then maybe I'll be able to forgive its past mistakes.
[2]
Exclusive: Sonos launches its first AI-powered sound mode to enhance speech, developed with a major hearing-loss charity - and it sounds great
Sonos has launched a new version of its Speech Enhancement tools for the Sonos Arc Ultra, which we rate as one of the best soundbars available. You'll still find these tools on the Now Playing screen in the Sonos app, but instead of having just a couple of options, you'll now have four new modes (Low, Medium, High and Max), all powered by the company's first use of an AI sound-processing tool. They should be available today (May 12th) to all users. These modes were developed in a year-long partnership with the Royal National Institute for the Deaf (RNID), the UK's leading charity for people with hearing loss. I spoke to Sonos and the RNID to get the inside story on its development here - but you can read on here for more of the details. The update launches today on Sonos Arc Ultra soundbars, but won't be available on any other Sonos soundbars because it requires a higher level of processing power, which the chip inside the Arc Ultra can provide, but the older soundbars can't. The AI element is used to analyze the sound passing through the soundbar in real time, and separate out the 'speech' elements from the sound so they can be made more prominent in the mix without affecting the rest of the sound too much. I've heard it in action during a demo at Sonos' UK product development facility, and it's very impressive. If you've used speech enhancement tools before, you're probably familiar with hearing the dynamic range of the sound, and especially the bass, suddenly get massively reduced in exchange for the speech elements getting pushed more forward. That's not the case with Sonos' new mode - powerful bass, the overall soundscape, and the more immersive Dolby Atmos elements are all maintained far better. That's for two reasons: one is that the speech is being enhanced separately to other parts, and the other is that it's a dynamic system that only activates when it detects that speech is likely to be drowned out by background noise. It won't activate if dialogue is happening against a quiet background, or if there's no dialogue in the scene. And it's a system that works by degrees - it applies more processing in the busiest scenes, and less when the audio is not as chaotic. On the two lowest modes, dialogue is picked out more clearly with no major harm to the rest of the soundtrack, based on my demo. On the High mode, the background was still maintained really well, but the speech started to sound a little more processed, and on Max I could hear the background getting its wings clipped a little, and some more artificiality to the speech - but the speech was extremely well picked out, and this mode is only really designed for the hard of hearing. I mentioned that the mode was developed with the RNID, which involved Sonos consulting with sound research experts at the RNID, but also getting people with different types and levels of hearing loss to test the modes at different stages of development and provide feedback. I spoke at length to the Sonos audio and AI architects who developed the new modes, as well as the RNID, but the key takeaway is that the collaboration led to Sonos putting more emphasis on retaining the immersive sound effects, and adding four levels of enhancement instead of the originally planned three. Despite the RNID's involvement, the new mode isn't designed to be solely for the hard of hearing. It's still just called Speech Enhancement, as it is now, and it's not hidden away like an accessibility tool - sound is improved for everyone, and 'everyone' now better includes people with mild to moderate hearing loss. The Low and Medium modes can also just function for those of us who need a bit of extra clarity in busy scenes. This isn't the first use of AI-powered speech separation I've seen - I've experienced it on Samsung TVs, and in a fun showcase from Philips TVs, where it was used to disable the commentary during sports but preserve the crowd sounds. But it's interesting that this is the first use of AI sound processing from Sonos, and the four-year development process, including a year of refinement with the RNID, shows that Sonos has taken a thoughtful approach to how it's best used that isn't always apparent in other AI sound processing applications. Here's my piece interviewing Sonos' AI and audio developers with researchers from the RNID. It's just a shame that it's exclusive to the Sonos Arc Ultra for now - though I'm sure that new versions of the Sonos Ray and Sonos Beam Gen 2 will be along before too long with the same upgraded chip to support the feature.
[3]
'I've been deaf since birth. What is "natural" sound?' The inside story on how Sonos developed its new AI speech enhancer in partnership with a major hearing loss charity
Exclusive: From working with Hollywood sound designers to awkward questions in studies Sonos has just unveiled its first AI-powered sound processing feature, in the form of new AI Speech Enhancement options for the Sonos Arc Ultra - and to learn how it came together I visited Sonos' UK audio development center. First, a recap on the feature. There are now four levels of dialogue boosting to choose from, and it works in a totally different way to Sonos' previous dialogue enhancement options, by separating the speech from the rest of the soundtrack and carefully adjusting it while better maintaining the dynamic range and immersive Dolby Atmos effects that make the Arc Ultra one of the best soundbars. I've heard it in action, and really keeps the punch of the bass and detail in effects while still enhancing speech. Interestingly, the feature was developed in conjunction with the Royal National Institute for Deaf People (RNID), the UK's leading charity for people with hearing loss, including a year of refining the feature by working directly with people who have different kinds and levels of hearing loss. But it's not an 'accessibility' feature hidden away in a menu - this is the new standard Speech Enhancement tool, available from the Now Playing screen in the Sonos app, and now with four options instead of the two that the Arc Ultra has currently. It's just that the higher-tier options are more suited to those with hearing loss than those who just want a bit of extra dialogue clarity - that's what the lower options are for. To dig into the background of developing the AI side of the features, as well as the work with the RNID, I visited Sonos' UK product development facilities, and spoke to Matt Benatan, Principle Audio Researcher at Sonos and the AI lead on the project; Harry Jones, Sound Experience Engineer at Sonos; Lauren Ward, Lead RNID Researcher; and Alastair Moore, RNID Researcher. Your first question might be why this option is only on the Sonos Arc Ultra. Matt Benatan explained that "with Arc Ultra, we have some more CPU capability that we can make use of" - it being the latest in Sonos' line-up means that it's the only one capable of supporting the AI algorithm, apparently. "The underlying technology here is something called source separation," explains Benatan. "What you want to do is to extract the signal of interest from some more complex signal. Traditionally, this is applied in telecommunications applications, so this is where a lot of the development work around these sorts of technologies comes from. "But a lot of the traditional methods are quite limited because what you're trying to remove there are what we term sort of more static types of noise - things like the sound of an air conditioner or the sound of traffic or a crowd." Followers of audio tech will recognize that kind of tech as being similar to what's in the best noise cancelling headphones - but here we don't want to fully remove sound, we want to enhance it, so it's a different proposition. "When we're dealing with multimedia content in film and television, what we've got are these intentionally crafted sonic experiences that contain a multitude of elements that are designed to engage. You're not supposed to ignore the explosions and the special effects and the music," says Benatan. "But at the same time, you can't engage with that content unless you can hear the dialogue. And so the idea here was to use some of these new neural network-based methods to go a bit further in the digital signal processing that we've been able to do before ... to apply these dynamic 'masks'. "It can adapt in a way that traditional techniques cannot adapt. And this means that [from] frame to frame of incoming audio, we can understand where that speech content is, and we can pull that out." Neural networks and so-called 'AI' sound processing systems need to be trained on sound files to learn what they should recognize (or not), and Sonos' AI processing was trained on 20,000 hours of realistic audio files - though not on real movies. This avoided the lingering question of the copyright implications of training an AI model on real works without permission for every sample (currently the subject of much debate, such as from the 'Make it fair' campaign in the UK). "What's really important when we're dealing with these sorts of problems is the variety of data that the models are exposed to," Benatan says. "In order to do that, we worked with an award-winning sound designer who helped us to design the material that we use to train the models ... to make sure that we are we're exposing the model to the information that it needs to see to provide the great experiences in the sort of in the future as a whole." As is common in AI model training, Benatan says Sonos used data augmentation for its training, meaning that the samples provided by the sound designer were used in different formats. For example, one sound file might be used in training in a stereo format, plus a 5.1 surround format, plus a Dolby Atmos 3D audio format - providing more information for the neural network as to how to pick out speech across different types of movie audio formats. This raised the question of whether it would have been easier if Sonos could simply have trained on a huge range of copyright-protected movies with impunity? Benatan says that "purely from a data acquisition standpoint, that would have been easier, but we would have lost so much value along the way doing that." Bentan says that working with sound designers meant they gained a greater understanding of how mixing works and the purpose of doing things in certain ways, meaning that they learned more about what the AI model needed to do than if they'd just tried training it outright - they learned what gaps using open data only would leave them with. "We were talking to talking to sound designers about the nature of different scenes and the kinds of compositions that we can expect to encounter. And they were able to provide things for us such as being able to separate, say, the foley and the sound effects," says Benatar. "Getting some insight into that underlying mixing process and that creative process to create these sonic scenes was really helpful to understand the challenges that we were seeing with the [testing model trained on open data]." Even with this data and promise of an improving AI model, there was still the question of how to make the most of it. Benatan explained that the decision to partner with the RNID came from the personal lives of people involved in the project. Bentan says, "My manager, James Nesfield [Director of Emerging Technologies at Sonos], and I were chatting about the difficulties that family members had with dialogue. So he was talking about the fact that his mum was really struggling to understand dialogue in in film and TV. And my father in law had recently [started wearing] hearing aids. "And if you know anybody who's who's got hearing aids, you know that it's a bit of a bumpy road to get them tuned correctly, to get familiar with them. And to begin with, it's not particularly fun to watch content with them. A lot of people like to take their hearing aids out when they when they watch content. It's a more natural presentation," Bentan continues. "We were like, this [model] can do more than just enhance speech in the way that we'd approached it previously. Like, this could really be something that can help people in the hearing health community. And it was at that point, that we decided to engage with the RNID," Benatan says. Lauren Ward adds, "This is the first time that we've been embedded so early in the process - often people come to us with products that are already basically ready to go, or are already out in the world. And that's great, but there's a limited amount that you can do when something in is basically ready to ship." Benatar and Ward explained that bringing the RNID in early provided a new avenue for feedback for Sonos that proved crucial, including from people who knew a ton about both audio and hearing health but don't necessarily work in the industry. Ward says that "people who also have knowledge of audio and have hearing loss tend to be massive nerds about their hearing loss. They tend to really want to understand what's going on. So they're actually an awesome resource in situations like this, because they can articulate their experiences really well, and then in later stages of the project we went after a broader group [of people with hearing loss]." The audio "nerds" apparently proved very helpful in describing their experiences, but bringing a wider group of people to test the mode meant that not everyone was able to communicate what they were hearing so well - but Ward, Benatar and Alastair Moore said that these conversations with could be clarifying in their own regard... both for the RNID as well as Sonos. Ward says: "One of the things that we were exploring in that first test was what engineering people would call perceptibility of artefacts. Can you hear what's going wrong? Does this voice sound unnatural? Things like that. We're thinking about ways that we can phrase that, that's closer to people's everyday language. "But even as someone who works with people with hearing loss all the time... in one of the sessions I was running, I had a gentleman and I was trying to find the right language to use with him, and I'd gone, 'Oh, does the voice still sound natural, even at the higher levels?' And he's like, 'I've been deaf since birth. What is natural?'" One of the other elements that the project aimed to incorporate was that how you change the sound matters - you can't just add sharpness and clarity and call that a win, because that can end up causing problems for other people. "There's a phenomenon called loudness recruitment," Benatar explains. "This is a rather vicious phenomenon whereby not only do quiet sounds become harder or impossible to hear, but louder sounds become less comfortable, they become painful. "So you're compressing the range in which somebody is actually able to comfortably listen. And this was really important in understanding how we would design [the new feature] - you know, what role compression plays in the delivery of dialogue when incorporating the feature. "Speech enhancement isn't just about those with hearing loss. It's about making sure that everybody can engage with the content they're watching on their terms, right? ... It wasn't just about dialogue. It was that they want to be able to enjoy the content like everybody else does. We latched onto that," Benatar adds. One thing I noted is that we're entering the era of personalized audio tuned to our hearing, from the likes of the customization of the Denon PERL Pro to your hearing, up to AirPods Pro 2 now functioning as FDA-approved hearing aids with adjustments made specifically to your level of hearing loss. Sonos' approach isn't exactly one-size fits all, but you could say it's four sizes fits all: Low, Medium, High and Max. I asked whether there was any concern over it perhaps not covering a wide enough set of needs through those options. Ward acknowledged that "It's always a balance between overwhelming options and the ability to personalize," but noted that having four options actually came out of the work with the RNID. "When Matt first presented this structure, the setting only had three levels, and it's grown out to four because what we found from our first listening test in particular was that it really did need to push further than that anyone thought initially. And I think what's important to differentiate with something like speech enhancement, that sits in entertainment products - versus anything that's trying to mimic hearing aids - is this it's not trying to compensate for every difference in hearing. It's about options, and someone who wants it on High or Max for one piece of content or on one particular day might not want it on another day," she explains. "Sometimes they'd say, 'I just want the immersive experience, I don't want any speech enhancement or anything to change. Yes, I might lose some of the speech but it's more about immersion.' Whereas for other pieces of content on other days, it's 'No, I really want to hear the dialogue.' And that can be the same person's hearing loss. And then you add multiple different people [watching in the same room] and you've just got so many different possible permutations." A key element is also the simplicity of actually using the feature. Ward says, "There are scores of parameters involved in the speech enhancement process, and we've seen visualizations where they're all changing all at once. But if you present all of that to a user, you can get really lost ... You're not going to have that on the Now Playing screen, whereas we're able to include this feature right there, which is just one extra step. I think means that it's it's going to get used loads." She adds that, "Something that we really passionately believe is that a crucial part of accessibility is usability." Speaking of accessibility, I asked if the study with the RNID found that people actually changed their viewing habits as a result of having improved sound, and Ward said they found that it didn't just mean that people changed how they watch, but also what they felt able to watch. "The one that jumps out immediately was during one of the pieces of content in our second listening session. It was something that had a sci-fi war scene, so there are bombs going off, there's dialogue, it's quite chaotic. And [one member of the study] expressed that generally, that is something where he would just look at the type of content and avoid right on its face, because it's going to be too loud, it's going to be too overwhelming," Ward explains. "Then coming listen to it with the speech enhancement on, it was like, 'Actually, I could go back and watch that because I can get that balance where I'm still in the content, but it's not too overwhelming.' And I think that gives us a glimpse into how some people are avoiding some things - or just choosing not to watch some things - not because they don't feel they'd enjoy the content itself, but because it doesn't feel accessible. It might feel too hard or too unpleasant." Alastair Moore punctuates this conversation with a point about how many people may be subtly affected by this kind of thing. "I think that around 50% of people over age 50 have some level of hearing loss so it's not a small number." The end result is a system tuned with all this mind, but the final interesting touch is that the system knows that it doesn't actually need to work all the time even when it's turned on. Harry Jones explains "We wanted to understand: when do we need to act? Because it's such a huge thing to affect the sound experiences as well. We don't want to touch the stuff that doesn't need to be lifted, we don't want to pull out a random crowd voice and favor that when it's a really exciting sequence with cars. Also, on the other end of the scale, if it's clean, we also don't need to go crazy with the processing." "Something that we learned in the RNID discussions was that it's not that they just want to bring dialogue up. It's it's they want to enjoy the process as well, the whole soundtrack as an entire thing." The solution was to analyze what's happening in the scene before deciding whether it needs to be changed. Each frame of sound analyzed is around 5.8 milliseconds, and if the trend in the sound changes (ie, has the dialogue started to become mixed with other loud noises) then the system reacts. Sonos identified 15 reasons why speech in a movie of TV show might be unclear, ranging from what's happening in the mix (mistakes when mastering, artistic intention), to issues with a room (echoes), to outside sound (street noise) and everything in between. It can't help with all of them, but it apparently proved instructive. Then they broke down some different types of sound mixes in scenes, ranging from those with people talking and no other sound, to something with music and effects only. "The real question was: when does it need enhancing, and when does it just need, you know, slightly cleaning up? The dialogue spectrum is anywhere between no dialogue at all, and as we move up [the other end of] that scale, we've got no sound and clean dialogue. Muffled dialogue over car noises needs the most help. Talking over music, less but some help," Jones explains. "The luxury of having speech extracted was that we knew when it was happening," he adds. If you have a Sonos Arc Ultra, you should be able to try the new mode out for yourself pretty much right away. For a lot of people, it won't be needed, especially because the Sonos Arc Ultra is pretty dialogue-forward on its own (a value I appreciate in it). But equally, I think there's a chance that a lot of people might like to use the 'Low' setting who wouldn't have wanted to use speech enhancement tools in the past - and I'll be very interested to see if the High and Max settings help people as much as Sonos and the RNID hope they will.
[4]
Sonos uses AI superpowers to boost dialogue in latest soundbar update
Sonos has turned to AI to make speech clearer in your content, with a new Speech Enhancement feature, a technology that it's now pushing out via a software update. The aim is to ensure that you can hear every word that's spoken, so the important dialogue isn't lost within the rest of the soundtrack. Clarity has been a growing problem for TV watchers, with increasing emphasis on that pounding bass or immersive soundtrack, sometimes the spoken elements get lost. That's a particular frustration for those with any sort of hearing loss, because you might not be able to follow the action at all, instead resorting to subtitles - which are often of varying quality. Recommended Videos Sonos' new Speech Enhancement feature aims to address this, with Sonos saying that AI provided a real "breakthrough", allowing the speech to be separated from other audio in the centre channel, so that it can be emphasized. It's not just about pushing the speech harder, it's about making it clear while still preserving the rest of the sound experience. Working with the Royal National Institute for Deaf People (a UK-based organization supporting those who are deaf, have hearing loss or tinnitus) the new Sonos Speech Enhancement feature offers four levels to choose from, with the top level specifically design for those with hearing loss. To access the feature you'll have to use the (beleaguered) Sonos app. Here's how Sonos describes the four different levels: Low - A subtle, artistic nudge that emphasizes dialogue while maintaining the original experience and creator intent. Medium - A medium enhancement that provides better dialogue clarity and a tasteful balance of the surrounding mix elements. High - A higher setting that makes dialogue obviously prominent while reducing other mix elements. Max - The most pronounced setting where dialogue clarity takes full priority, designed for those with hearing loss. Unlike the more balanced approach of Low, Medium and High levels, Max level further controls the dynamic range of non-speech elements, placing dialogue firmly at the forefront of the experience. Soundbars have offered speech enhancement options for a number of years, but Sonos says that these modes "lacked the effectiveness and sound quality needed to truly solve the problem". The solution is Sonos' AI-powered offering, which is rolling out to the Sonos Arc Ultra from today. Just to be clear, this isn't about volume, it's about changing the emphasis in the soundtrack so that the listener has more control over how speech comes through. If you find that you struggle to make out what people are saying in movies or TV shows, then this could be the solution. "One in three adults in the UK experience hearing loss, and it is reported that just under one in four adults in the USA do too," said Lauren Ward, Lead RNID Researcher. "This tool has the potential to impact a large number of people." The Sonos Arc Ultra features on our selection of best soundbars, offering one of the best Dolby Atmos experiences, with the flexibility to expand the system to make it more potent. We previously praised the dialogue delivery when we reviewed the Sonos Arc Ultra, but now things should be even better.
Share
Share
Copy Link
Sonos launches a new AI-driven Speech Enhancement feature for its Arc Ultra soundbar, developed in collaboration with the Royal National Institute for Deaf People, to improve dialogue clarity in movies and TV shows.
Sonos, the renowned audio equipment manufacturer, has introduced a groundbreaking AI-powered Speech Enhancement feature for its Arc Ultra soundbar. This innovative technology, developed in collaboration with the Royal National Institute for Deaf People (RNID), aims to address a common frustration among movie and TV show viewers: unclear dialogue amidst loud soundtracks 1.
The new feature utilizes machine learning to separate dialogue from other audio elements in real-time. This AI-driven approach allows for more nuanced and effective speech enhancement compared to traditional methods. Matt Benatan, Principal Audio Researcher at Sonos, explains that the technology uses "source separation" to extract the signal of interest from complex audio 3.
The Speech Enhancement feature offers four levels of adjustment:
This tiered system caters to various needs, from those seeking slight improvement to individuals with hearing impairments.
Sonos trained its AI model on 20,000 hours of audio created by an award-winning sound designer. This approach ensured exposure to diverse audio scenarios while avoiding potential copyright issues associated with using actual movies for training 3.
The year-long partnership with RNID played a crucial role in refining the feature. This collaboration involved consultations with sound research experts and testing with individuals experiencing different types and levels of hearing loss 2.
The new Speech Enhancement feature is currently exclusive to the Sonos Arc Ultra soundbar due to its advanced processing capabilities. It is available as a free update from May 13, 2025 1 2.
With hearing loss affecting a significant portion of the population – one in three adults in the UK and nearly one in four in the USA – this technology has the potential to improve the viewing experience for millions of people 4.
Reference
[1]
[4]
DTS, a subsidiary of Xperi Inc., has introduced Clear Dialogue, an AI-powered audio technology designed to improve speech clarity in TV content. This innovation aims to address the common issue of unclear dialogue in movies and TV shows.
4 Sources
4 Sources
Bose introduces a cutting-edge Smart Ultra Soundbar with AI-powered features and transforms its QuietComfort Ultra Earbuds into wireless rear speakers, revolutionizing the home audio experience.
8 Sources
8 Sources
Samsung introduces new flagship soundbars with AI-powered features, improved designs, and support for the new Eclipsa Audio format at CES 2025.
2 Sources
2 Sources
Researchers at the University of Washington have developed AI-powered headphones that create a customizable 'sound bubble', allowing users to hear nearby conversations clearly while significantly reducing background noise.
4 Sources
4 Sources
A look at how AI is being integrated into home appliances and smart devices, focusing on practical applications and consumer reactions at CES 2025.
3 Sources
3 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved