4 Sources
[1]
Earbuds Get Eyes as VueBuds Incorporate Visual AI
Smartphone cameras and some smart glasses allow users to query AI models and receive answers about what they're looking at. Soon, that capability could expand to other devices, including earbuds. Researchers at the University of Washington have developed a pair of earbuds they call VueBuds that integrate a small, low-resolution camera into each earbud. The prototype earbuds have features similar to those of smart glasses, like the Ray-Ban Meta glasses -- things like translating signs in foreign languages, acting as an aid for low-vision wearers, or identifying plant species during a hike. Smart glasses have their drawbacks, including privacy concerns and comfort. The under-the-radar cameras have faced criticism and lawsuits over concerns they can record unsuspecting bystanders and what ultimately happens to sensitive visual data. And not everyone likes wearing glasses -- some even opt for contact lenses to avoid having to wear them, including Shyam Gollakota, the University of Washington professor who led the VueBuds research. "The one predominant wearable which almost everyone wears is your earbuds," he says. His team presents earbuds as an alternative to smart glasses that's less intrusive and better for privacy. The primary goal of the research, however, was to demonstrate that this small, ear-worn form factor is even possible. "Traditionally, earbuds have been limited to audio interfaces," Gollakota says. "We show that we can indeed build a system within that form factor and get lots of intelligence by running visual language models." The research was presented today at the ACM Computer-Human Interaction conference in Barcelona. Gollakota and his colleagues don't expect VueBuds to be the only interface for visual AI. "Wearables are very personal," says Maruchi Kim, a Ph.D. student in Gollakota's lab. Some people may prefer glasses or watches, others might like rings, and so Kim suspects there won't be one device to rule them all. "We're just trying to introduce another category to demonstrate that everything smart glasses do can be achieved on [earbuds]." That said, the interface may have some advantages. Because they're already widely used, people may be more likely to adopt the technology. Plus, Kim says, "there's already a social paradigm for putting your earbuds away in their case." Smart glasses may have prescription lenses, so the wearer would keep them on all the time. But "if you ever want to be confident that these cameras aren't recording, earbuds are a nice form factor that lets you just tuck it away when you're ready." Many of the AI features users indicate an interest in are also "episodic use cases," Kim says. To translate a street sign or ingredients on a package, for instance, you don't need a continuous video stream. There are three key challenges to making vision-capable earbuds possible, Gollakota says: Fitting the camera within strict size, power, and weight constraints; transmitting the data; and creating a complete visual scene when worn in the ears. Cameras typically take a lot of power, making this the number one concern. "The batteries in your earbuds are about ten times as small as what you have on smart glasses," Kim says. Visual data also requires much higher bandwidth than audio, so the videos recorded by glasses are typically sent via Wi-Fi to be processed by cloud-based AI models. Wi-Fi allows for high bandwidth -- but takes more power. VueBuds transmits low-resolution, grayscale images over Bluetooth. Most device makers try to transmit as much data as possible, but Gollakota's team took a different approach. They wanted to see what the lowest resolution a visual language model would need to extract useful information, opting for a 324-by-324-pixel image sensor. Beyond the power and bandwidth concerns, the researchers also had to make sure earbud cameras could see enough. Placing cameras at the ears creates a blind spot on either side from where the face blocks each camera's view. But by setting the cameras at a slight angle (5 or 10 degrees) away from the face and stitching together images, the team found they could reconstruct a more complete scene with a wide field of view. This does, however, create a small blind spot for objects closer than about 20 centimeters from the face directly in front of the user. The researchers tested the earbuds with four different visual language models. In user studies with the best performing model (Qwen2.5-VL), VueBuds achieved about 82 percent accuracy for object recognition, 94 percent for character recognition, 84 percent for translation, and 87 percent overall accuracy in user studies. The earbuds performed comparably to Ray-Ban Meta glasses across 17 tasks. In the future, the team hopes to add color to the system. Kim is also looking into improving the resolution possible by incorporating an on-device JPEG encoder, which would significantly reduce the size of images sent to be processed. Many users have been wary of privacy and surveillance concerns with smart glasses. Those worries are intensifying with new evidence that the companies building these glasses may be mishandling the data they capture. Given those concerns, should we add cameras to yet another wearable device? The University of Washington researchers, say VueBud's stripped-down image capture is a boon for privacy compared to today's smart glasses. For one thing, the system is designed to run on a smartphone or other local device, so data never goes to the cloud, Gollakota says. VueBuds also only captures still images. One of the main uses of Meta's smart glasses is now recording video, but he adds, "no one wants to see a low-resolution grayscale video in the first place." Additionally, VueBuds are activated by voice commands. "That audio initiation means that everyone around you would know what you're actually asking." Smart glasses, meanwhile, can start recording with the touch of a button. Gollakota notes that most people have become accustomed to having microphones in nearly every device, because they provide enough utility through capabilities like voice commands and "a trust has been built" with companies, like Apple, that sell devices with built-in microphones. Whether the same paradigm will emerge for visual intelligence remains to be seen with how the technology -- and our level of trust in it -- evolves over the next few years. Apple is also rumored to be developing next-generation Airpods that integrate infrared cameras to enable gesture recognition and improve spatial audio. These wouldn't have the visual intelligence capabilities made possible with standard cameras, but it would indicate more interest in expanding the capabilities of what has traditionally been an audio-only interface. Earbuds are "the most successful wearable we have today, and right now it's limited to being an audio interface," Gollakota says. "Bringing visual intelligence would make it a much richer and more powerful interface than what it currently is."
[2]
Camera-equipped AI earbuds tell you what you're looking at
The current VueBuds prototype - a set of Sony WF-1000XM3 earbuds with tiny added cameras Earbuds are small, which is great for comfort, but their tininess is a serious limitation for actually doing things other than letting you hear and talk. You can't use them to fly, fry, pry, or purify. Compare them with a smartphone and they're one-hit (two, actually) wonders, right? They'll never even compete with a Swiss Army Knife. Pathetic. But what if you shoved cameras inside your earbuds and connected them to a voice-activated, speaking LLM (large language model) that could answer your questions about anything you were looking at? Uh, why would anybody do that? Well, ever hear of described audio (DA), bud? And while DA would be massively helpful to anyone with visual impairments, imagine the benefits for safety, productivity, and navigation from simply being able to ask questions and get answers from a disembodied voice in your ear (like the Great Gazoo, Harvey, or Head Six) that can "see" exactly where you're looking. And no, not questions like, "Is God hiding behind that cloud," but more like, "What does this Spanish road sign mean?" or "What are all these devices on my new work station?". So, why not just use Google Glass? Turns out that the public hated those enough to call their users "Glassholes," partly because citizens didn't appreciate ordinary people turning themselves into unwitting, nonstop spies for Big Data at a cost of $1,500 while looking like cyborgs. Well, apparently Maruchi Kim, Rasya Fawwaz, and the rest of their University of Washington at Seattle co-authors must have understood all that, because as they explained in their Human Factors in Computing systems conference paper, they've created what are known as VueBuds. Their innovation houses tiny cameras inside standard Sony WF-1000XM3 earbuds, and uses a built-in vision language model (VLM) so users can verbally ask questions and get answers about what they're seeing - an extremely convenient, mobile, and audio version of reverse image-search for description, explanation, and translation. According to senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering, VueBuds overcome the ghost of Google Glass in several ways. First, they do so by embedding rice-grain-sized cameras inside earbuds, because even in the year 2026, "a lot of people don't like wearing glasses." As well, not only do people being observed hate the invasion of their privacy, so do the observers themselves, as "recording high-resolution video and processing it in the cloud" offers a user's social-geographic life on a digital platter to our Big Data overlords. "But almost everyone wears earbuds already," says Gollakota, "so we wanted to see if we could put visual intelligence into tiny, low-power earbuds, and also address privacy concerns in the process." According to the Gollakota and his colleagues, VueBuds are also fast and low-power, largely by turning a low-bandwidth, low-resolution bug into a feature. The low-res black-and-white cameras need less than 5 mW to work, and then automatically deactivate to save battery life. The authors claim that in a test with 17 visual question-and-answer tests involving 90 users, VueBuds achieve "response quality on par with Ray-Ban Meta," demonstrating their "compelling platform for visual intelligence" that bring "rapidly advancing VLM capabilities" to earbuds, one of the world's most widely used wearable devices. In the following demonstration video, a man stands in an apartment kitchen while wearing VueBuds, which in the video are larger than typical earbuds - closer to the thumb-sized Bluetooth earbuds from 20 years ago. He asks for a description of where he's looking, and in about a second, an AI voice imitating a relaxed human woman announces, "I see a kitchen area with a window letting it a lot of light. On the counter, there are some bottles and a book. The window has blinds, and there's a sink to the left." Then, while looking at the cover of an LP, he asks VueBuds to tell him the name of it. The voice quickly and correctly responds, "I see a photograph of an album cover on the table. It appears to be Abby Road by the Beatles." According to the researchers, in tests with 16 participants, VueBuds was correct around 83% of the time during object-identification and translate, and 93% when identifying book titles and authors, meaning that one day every user who can't read Mandarin could order from the "secret" Chinese menu (not secret to a billion people) or read manhwa that haven't yet been translated from Korean. But since the cameras are in earbuds at the sides of your face, wouldn't your own head block the cameras' views? No, thanks to the same principle that allows all of us two-eyed creatures to see and understand the world: stereoscopic vision. Just as your brain effortlessly combines visual data from two pupils about a palm's width from each other, the VueBuds' AI meshes two separate camera images into one. The VueBuds tech does have limitations. Its use of monochrome cameras means VueBuds can't answer any questions about color, and currently, real-world navigation and translation for readers and travelers requires higher-powered, high-resolution cameras. Nor can the battery sustain continuous video-streaming of large amounts of data from its still-image cameras. Also, lest anyone imagine that VLM seeing-eye buds are nothing but a benefit for humanity, remember a few years ago when a tech company was boast-posting about their new product with the rhetorical question, "What if an app could snap a picture to tell you a stranger's name?" The memed response was "Women would die." The current version of VueBuds likewise offers only minimal reassurance that it doesn't pose a potential threat to public safety. A small "on" light doesn't mean much - how many people being watched would think an earbud is taking their picture? And while the device shoots only low-resolution, B&W still images, when combined with audio-capture and Bluetooth connection to the internet for third-party facial recognition, the threat to privacy is obvious and massive. However, if regulators can assure public safety, devices such as VueBuds can offer enormous freedom and improvements in quality of life and leisure for countless people with access to them.
[3]
Your next earbuds could translate text and identify objects for you
Researchers at the University of Washington have developed a new prototype system that could change how people interact with artificial intelligence in daily life. Called VueBuds, the system integrates tiny cameras into standard wireless earbuds, allowing users to ask an AI model questions about the world around them in near real time. The concept is simple but powerful. A user can look at an object, such as a food package in a foreign language, and ask the AI to translate it. Within about a second, the system responds with an answer through the earbuds, creating a seamless, hands-free interaction. A Different Approach To AI Wearables Unlike smart glasses, which have struggled with adoption due to privacy concerns and design limitations, VueBuds takes a more subtle approach. The system uses low-resolution, black-and-white cameras embedded in earbuds to capture still images rather than continuous video. Recommended Videos These images are transmitted via Bluetooth to a connected device, where a small AI model processes them locally. This on-device processing ensures that data does not need to be sent to the cloud, addressing one of the biggest concerns around wearable cameras. To further enhance privacy, the earbuds include a visible indicator light when recording and allow users to delete captured images instantly. Engineering Around Power And Performance Limits One of the biggest challenges the research team faced was power consumption. Cameras require significantly more energy than microphones, making it impractical to use high-resolution sensors like those found in smart glasses. To solve this, the team used a camera roughly the size of a grain of rice, capturing low-resolution grayscale images. This approach reduces battery usage and allows efficient Bluetooth transmission without compromising responsiveness. Placement was another key consideration. By angling the cameras slightly outward, the system achieves a field of view between 98 and 108 degrees. While there is a small blind spot for objects held extremely close, researchers found this does not affect typical usage. The system also combines images from both earbuds into a single frame, improving processing speed. This allows VueBuds to respond in about one second, compared to two seconds when handling images separately. Performance Compared To Smart Glasses In testing, 74 participants compared VueBuds with smart glasses such as Meta's Ray-Ban models. Despite using lower-resolution images and local processing, VueBuds performed similarly overall. The report showed participants preferred VueBuds for translation tasks, while smart glasses performed better at counting objects. In separate trials, VueBuds achieved accuracy rates of around 83-84% for translation and object identification, and up to 93% for identifying book titles and authors. Why This Matters And What Comes Next The research highlights a potential shift in how AI-powered wearables are designed. By embedding visual intelligence into a device people already use, the system avoids many of the barriers faced by smart glasses. However, limitations remain. The current system cannot interpret color, and its capabilities are still in early stages. The team plans to explore adding color sensors and developing specialised AI models for tasks like translation and accessibility support. The researchers will present their findings at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona, offering a glimpse into a future where everyday devices quietly become intelligent assistants.
[4]
Tiny Cameras in Earbuds Let Users Talk with AI About What They See | Newswise
Newswise -- University of Washington researchers developed the first system that incorporates tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. For instance, a user might turn to a Korean food package and say, "Hey Vue, translate this for me." They'd then hear an AI voice say, "The visible text translates to 'Cold Noodles' in English." The prototype system called VueBuds takes low-resolution, black-and-white images, which it transmits over Bluetooth to a phone or other nearby device. A small artificial intelligence model on the device then answers questions about the images within around a second. For privacy, all of the processing happens on the device, a small light turns on when the system is recording, and users can immediately delete images. The team will present its research April 14 at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona. "We haven't seen most people adopt smart glasses or VR headsets, in part because a lot of people don't like wearing glasses, and they often come with privacy concerns, such as recording high-resolution video and processing it in the cloud," said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. "But almost everyone wears earbuds already, so we wanted to see if we could put visual intelligence into tiny, low-power earbuds, and also address privacy concerns in the process." Cameras use far more power than the microphones already in earbuds, so using the same sort of high-res cameras as those in smart glasses wouldn't work. Also, large amounts of information can't stream continuously over Bluetooth, so the system can't run continuous video. The team found that using a low-power camera -- roughly the size of a grain of rice -- to shoot low-resolution, black-and-white still images limited battery drain and allowed for Bluetooth transmission while preserving performance. There was also the matter of placement. "One big question we had was: Will your face obscure the view too much? Can earbud cameras capture the user's view of the world reliably?" said lead author Maruchi Kim, who completed this work as a UW doctoral student in the Allen School. The team found that angling each camera 5-10 degrees outward provides a 98-108 degree field of view. While this creates a small blind spot when objects are held closer than 20 centimeters from the user, people rarely hold things that close to examine them -- making it a non-issue for typical interactions. Researchers also discovered that while the vision language model was largely able to make sense of the images from each earbud, having to process images from both earbuds slowed it down. So they had the system "stitch" the two images into one, identifying overlapping imagery and combining it. This allows the system to respond in one second -- quick enough to feel like real-time for users -- rather than the two seconds it takes with separate images. The team then had 74 participants compare recorded outputs from VueBuds with outputs from Ray-Ban Meta Glasses in a series of tests. Despite VueBuds using low-resolution images with greater privacy controls and the Ray-Bans taking high-res images processed on the cloud, the two systems performed equivalently. Participants preferred VueBuds' translations, while the Ray-Bans did better at counting objects. Sixteen participants also wore VueBuds and tested the system's ability to translate and answer basic questions about objects. VueBuds achieved 83-84% accuracy when translating or identifying objects and 93% when identifying the author and title of a book. This study was designed to gauge the feasibility of integrating cameras in wireless earbuds. Since the system only takes grayscale images, it can't answer questions that involve color in the scene. The team wants to add color to the system -- color cameras require more power -- and to train specialized AI models for specific use cases, such as translation. "This study lets us glimpse what's possible just using a general purpose language model and our wireless earbuds with cameras," Kim said. "But we'd like to study the system more rigorously for applications like reading a book -- for people who have low vision or are blind, for instance -- or translating text for travelers."
Share
Copy Link
University of Washington researchers unveiled VueBuds, AI earbuds equipped with rice-grain-sized cameras that let users ask questions about their surroundings. The prototype achieves 83-84% accuracy in object recognition and translation while addressing privacy concerns through local processing and low-resolution imaging. Unlike smart glasses, VueBuds leverage a device people already wear daily.
Researchers at the University of Washington have developed VueBuds, a prototype system that integrates tiny cameras into wireless earbuds to enable visual AI interactions
1
. Led by Shyam Gollakota, a professor in the Paul G. Allen School of Computer Science & Engineering, the project demonstrates that earbuds with cameras can perform tasks similar to smart glasses like Ray-Ban Meta—translating foreign language text, identifying objects, and aiding low-vision users2
. The system captures low-resolution, grayscale images through rice-grain-sized cameras embedded in each earbud, then processes them locally using a vision language model that responds within approximately one second4
.
Source: New Atlas
Unlike smart glasses that have faced criticism over privacy concerns and recording capabilities, VueBuds takes a different approach to AI-powered wearables
1
. The system uses low-power consumption cameras requiring less than 5 mW to operate, capturing 324-by-324-pixel black-and-white still images rather than continuous video2
. All processing happens through local processing on a connected device via Bluetooth, eliminating the need to send data to the cloud3
. A visible indicator light activates when recording, and users can immediately delete captured images, addressing the surveillance concerns that plagued Google Glass4
.The research team faced three key challenges: fitting cameras within strict size and power constraints, transmitting data efficiently, and creating a complete visual scene when cameras are positioned at the ears
1
. Maruchi Kim, a Ph.D. student who served as lead author, explained that batteries in earbuds are approximately ten times smaller than those in smart glasses, making power management critical1
. By angling each low-resolution camera 5-10 degrees outward, the team achieved a field of view between 98 and 108 degrees3
. The system stitches images from both earbuds into a single frame, which improved processing speed from two seconds to one second—fast enough to feel like real-time interaction4
.In user studies involving 74 participants, VueBuds performed comparably to Ray-Ban Meta glasses across 17 tasks despite using significantly lower resolution and greater privacy controls
4
. Testing with the Qwen2.5-VL vision language model showed VueBuds achieved approximately 82% accuracy for object recognition, 94% for character recognition, 84% for language translation, and 87% overall accuracy1
. Separate trials with 16 participants demonstrated 83-84% accuracy when translating or identifying objects and 93% accuracy when identifying book titles and authors2
. Participants preferred VueBuds for translation tasks, while Ray-Ban Meta performed better at counting objects3
.
Source: IEEE
Related Stories
The research, presented at the ACM Computer-Human Interaction conference in Barcelona, positions VueBuds as an alternative to smart glasses that leverages a device people already wear daily
1
. Gollakota noted that many people prefer not to wear glasses, opting instead for contact lenses, and that earbuds represent "the one predominant wearable which almost everyone wears"1
. Kim emphasized that wearables are personal, suggesting there won't be one device to dominate all use cases, but VueBuds demonstrate that "everything smart glasses do can be achieved on earbuds"1
. The audio interface also benefits from existing social norms—users can easily store earbuds in their case when they want confidence that cameras aren't recording1
.The team plans to incorporate color capabilities into the system, though color cameras require more power than the current grayscale setup
4
. Kim is exploring improved resolution through an on-device JPEG encoder, which would significantly reduce image file sizes during transmission1
. The researchers aim to train specialized AI models for specific applications like aiding low-vision users with reading books or helping travelers with real-time translation4
. These episodic use cases—translating street signs, identifying ingredients on packages, or recognizing plant species during hikes—don't require continuous video streams, making the earbud form factor particularly suitable1
.Summarized by
Navi
[3]
23 Jan 2025•Technology

07 Jan 2026•Technology

10 Mar 2025•Technology

1
Health

2
Technology

3
Technology
