3 Sources
3 Sources
[1]
Earbuds Get Eyes as VueBuds Incorporate Visual AI
Smartphone cameras and some smart glasses allow users to query AI models and receive answers about what they're looking at. Soon, that capability could expand to other devices, including earbuds. Researchers at the University of Washington have developed a pair of earbuds they call VueBuds that integrate a small, low-resolution camera into each earbud. The prototype earbuds have features similar to those of smart glasses, like the Ray-Ban Meta glasses -- things like translating signs in foreign languages, acting as an aid for low-vision wearers, or identifying plant species during a hike. Smart glasses have their drawbacks, including privacy concerns and comfort. The under-the-radar cameras have faced criticism and lawsuits over concerns they can record unsuspecting bystanders and what ultimately happens to sensitive visual data. And not everyone likes wearing glasses -- some even opt for contact lenses to avoid having to wear them, including Shyam Gollakota, the University of Washington professor who led the VueBuds research. "The one predominant wearable which almost everyone wears is your earbuds," he says. His team presents earbuds as an alternative to smart glasses that's less intrusive and better for privacy. The primary goal of the research, however, was to demonstrate that this small, ear-worn form factor is even possible. "Traditionally, earbuds have been limited to audio interfaces," Gollakota says. "We show that we can indeed build a system within that form factor and get lots of intelligence by running visual language models." The research was presented today at the ACM Computer-Human Interaction conference in Barcelona. Gollakota and his colleagues don't expect VueBuds to be the only interface for visual AI. "Wearables are very personal," says Maruchi Kim, a Ph.D. student in Gollakota's lab. Some people may prefer glasses or watches, others might like rings, and so Kim suspects there won't be one device to rule them all. "We're just trying to introduce another category to demonstrate that everything smart glasses do can be achieved on [earbuds]." That said, the interface may have some advantages. Because they're already widely used, people may be more likely to adopt the technology. Plus, Kim says, "there's already a social paradigm for putting your earbuds away in their case." Smart glasses may have prescription lenses, so the wearer would keep them on all the time. But "if you ever want to be confident that these cameras aren't recording, earbuds are a nice form factor that lets you just tuck it away when you're ready." Many of the AI features users indicate an interest in are also "episodic use cases," Kim says. To translate a street sign or ingredients on a package, for instance, you don't need a continuous video stream. There are three key challenges to making vision-capable earbuds possible, Gollakota says: Fitting the camera within strict size, power, and weight constraints; transmitting the data; and creating a complete visual scene when worn in the ears. Cameras typically take a lot of power, making this the number one concern. "The batteries in your earbuds are about ten times as small as what you have on smart glasses," Kim says. Visual data also requires much higher bandwidth than audio, so the videos recorded by glasses are typically sent via Wi-Fi to be processed by cloud-based AI models. Wi-Fi allows for high bandwidth -- but takes more power. VueBuds transmits low-resolution, grayscale images over Bluetooth. Most device makers try to transmit as much data as possible, but Gollakota's team took a different approach. They wanted to see what the lowest resolution a visual language model would need to extract useful information, opting for a 324-by-324-pixel image sensor. Beyond the power and bandwidth concerns, the researchers also had to make sure earbud cameras could see enough. Placing cameras at the ears creates a blind spot on either side from where the face blocks each camera's view. But by setting the cameras at a slight angle (5 or 10 degrees) away from the face and stitching together images, the team found they could reconstruct a more complete scene with a wide field of view. This does, however, create a small blind spot for objects closer than about 20 centimeters from the face directly in front of the user. The researchers tested the earbuds with four different visual language models. In user studies with the best performing model (Qwen2.5-VL), VueBuds achieved about 82 percent accuracy for object recognition, 94 percent for character recognition, 84 percent for translation, and 87 percent overall accuracy in user studies. The earbuds performed comparably to Ray-Ban Meta glasses across 17 tasks. In the future, the team hopes to add color to the system. Kim is also looking into improving the resolution possible by incorporating an on-device JPEG encoder, which would significantly reduce the size of images sent to be processed. Many users have been wary of privacy and surveillance concerns with smart glasses. Those worries are intensifying with new evidence that the companies building these glasses may be mishandling the data they capture. Given those concerns, should we add cameras to yet another wearable device? The University of Washington researchers, say VueBud's stripped-down image capture is a boon for privacy compared to today's smart glasses. For one thing, the system is designed to run on a smartphone or other local device, so data never goes to the cloud, Gollakota says. VueBuds also only captures still images. One of the main uses of Meta's smart glasses is now recording video, but he adds, "no one wants to see a low-resolution grayscale video in the first place." Additionally, VueBuds are activated by voice commands. "That audio initiation means that everyone around you would know what you're actually asking." Smart glasses, meanwhile, can start recording with the touch of a button. Gollakota notes that most people have become accustomed to having microphones in nearly every device, because they provide enough utility through capabilities like voice commands and "a trust has been built" with companies, like Apple, that sell devices with built-in microphones. Whether the same paradigm will emerge for visual intelligence remains to be seen with how the technology -- and our level of trust in it -- evolves over the next few years. Apple is also rumored to be developing next-generation Airpods that integrate infrared cameras to enable gesture recognition and improve spatial audio. These wouldn't have the visual intelligence capabilities made possible with standard cameras, but it would indicate more interest in expanding the capabilities of what has traditionally been an audio-only interface. Earbuds are "the most successful wearable we have today, and right now it's limited to being an audio interface," Gollakota says. "Bringing visual intelligence would make it a much richer and more powerful interface than what it currently is."
[2]
Your next earbuds could translate text and identify objects for you
Researchers at the University of Washington have developed a new prototype system that could change how people interact with artificial intelligence in daily life. Called VueBuds, the system integrates tiny cameras into standard wireless earbuds, allowing users to ask an AI model questions about the world around them in near real time. The concept is simple but powerful. A user can look at an object, such as a food package in a foreign language, and ask the AI to translate it. Within about a second, the system responds with an answer through the earbuds, creating a seamless, hands-free interaction. A Different Approach To AI Wearables Unlike smart glasses, which have struggled with adoption due to privacy concerns and design limitations, VueBuds takes a more subtle approach. The system uses low-resolution, black-and-white cameras embedded in earbuds to capture still images rather than continuous video. Recommended Videos These images are transmitted via Bluetooth to a connected device, where a small AI model processes them locally. This on-device processing ensures that data does not need to be sent to the cloud, addressing one of the biggest concerns around wearable cameras. To further enhance privacy, the earbuds include a visible indicator light when recording and allow users to delete captured images instantly. Engineering Around Power And Performance Limits One of the biggest challenges the research team faced was power consumption. Cameras require significantly more energy than microphones, making it impractical to use high-resolution sensors like those found in smart glasses. To solve this, the team used a camera roughly the size of a grain of rice, capturing low-resolution grayscale images. This approach reduces battery usage and allows efficient Bluetooth transmission without compromising responsiveness. Placement was another key consideration. By angling the cameras slightly outward, the system achieves a field of view between 98 and 108 degrees. While there is a small blind spot for objects held extremely close, researchers found this does not affect typical usage. The system also combines images from both earbuds into a single frame, improving processing speed. This allows VueBuds to respond in about one second, compared to two seconds when handling images separately. Performance Compared To Smart Glasses In testing, 74 participants compared VueBuds with smart glasses such as Meta's Ray-Ban models. Despite using lower-resolution images and local processing, VueBuds performed similarly overall. The report showed participants preferred VueBuds for translation tasks, while smart glasses performed better at counting objects. In separate trials, VueBuds achieved accuracy rates of around 83-84% for translation and object identification, and up to 93% for identifying book titles and authors. Why This Matters And What Comes Next The research highlights a potential shift in how AI-powered wearables are designed. By embedding visual intelligence into a device people already use, the system avoids many of the barriers faced by smart glasses. However, limitations remain. The current system cannot interpret color, and its capabilities are still in early stages. The team plans to explore adding color sensors and developing specialised AI models for tasks like translation and accessibility support. The researchers will present their findings at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona, offering a glimpse into a future where everyday devices quietly become intelligent assistants.
[3]
Tiny Cameras in Earbuds Let Users Talk with AI About What They See | Newswise
Newswise -- University of Washington researchers developed the first system that incorporates tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. For instance, a user might turn to a Korean food package and say, "Hey Vue, translate this for me." They'd then hear an AI voice say, "The visible text translates to 'Cold Noodles' in English." The prototype system called VueBuds takes low-resolution, black-and-white images, which it transmits over Bluetooth to a phone or other nearby device. A small artificial intelligence model on the device then answers questions about the images within around a second. For privacy, all of the processing happens on the device, a small light turns on when the system is recording, and users can immediately delete images. The team will present its research April 14 at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona. "We haven't seen most people adopt smart glasses or VR headsets, in part because a lot of people don't like wearing glasses, and they often come with privacy concerns, such as recording high-resolution video and processing it in the cloud," said senior author Shyam Gollakota, a UW professor in the Paul G. Allen School of Computer Science & Engineering. "But almost everyone wears earbuds already, so we wanted to see if we could put visual intelligence into tiny, low-power earbuds, and also address privacy concerns in the process." Cameras use far more power than the microphones already in earbuds, so using the same sort of high-res cameras as those in smart glasses wouldn't work. Also, large amounts of information can't stream continuously over Bluetooth, so the system can't run continuous video. The team found that using a low-power camera -- roughly the size of a grain of rice -- to shoot low-resolution, black-and-white still images limited battery drain and allowed for Bluetooth transmission while preserving performance. There was also the matter of placement. "One big question we had was: Will your face obscure the view too much? Can earbud cameras capture the user's view of the world reliably?" said lead author Maruchi Kim, who completed this work as a UW doctoral student in the Allen School. The team found that angling each camera 5-10 degrees outward provides a 98-108 degree field of view. While this creates a small blind spot when objects are held closer than 20 centimeters from the user, people rarely hold things that close to examine them -- making it a non-issue for typical interactions. Researchers also discovered that while the vision language model was largely able to make sense of the images from each earbud, having to process images from both earbuds slowed it down. So they had the system "stitch" the two images into one, identifying overlapping imagery and combining it. This allows the system to respond in one second -- quick enough to feel like real-time for users -- rather than the two seconds it takes with separate images. The team then had 74 participants compare recorded outputs from VueBuds with outputs from Ray-Ban Meta Glasses in a series of tests. Despite VueBuds using low-resolution images with greater privacy controls and the Ray-Bans taking high-res images processed on the cloud, the two systems performed equivalently. Participants preferred VueBuds' translations, while the Ray-Bans did better at counting objects. Sixteen participants also wore VueBuds and tested the system's ability to translate and answer basic questions about objects. VueBuds achieved 83-84% accuracy when translating or identifying objects and 93% when identifying the author and title of a book. This study was designed to gauge the feasibility of integrating cameras in wireless earbuds. Since the system only takes grayscale images, it can't answer questions that involve color in the scene. The team wants to add color to the system -- color cameras require more power -- and to train specialized AI models for specific use cases, such as translation. "This study lets us glimpse what's possible just using a general purpose language model and our wireless earbuds with cameras," Kim said. "But we'd like to study the system more rigorously for applications like reading a book -- for people who have low vision or are blind, for instance -- or translating text for travelers."
Share
Share
Copy Link
University of Washington researchers developed VueBuds, the first AI earbuds with integrated cameras that let users interact with visual AI. The prototype uses low-resolution cameras to translate text and identify objects with 83-84% accuracy, addressing privacy concerns that plague smart glasses while fitting into a device people already wear daily.
University of Washington researchers have developed VueBuds, the first system that incorporates tiny cameras into wireless earbuds to enable visual AI interactions
1
. The prototype AI earbuds allow users to ask an AI model questions about their surroundings in near real-time, similar to capabilities found in smart glasses like Meta Ray-Ban2
. A user can look at a Korean food package and say, "Hey Vue, translate this for me," receiving an AI voice response within about a second that says, "The visible text translates to 'Cold Noodles' in English"3
.
Source: IEEE
Led by Shyam Gollakota, a professor at the Paul G. Allen School of Computer Science & Engineering, the research team presented their findings on April 14 at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona
3
. The project demonstrates that earbuds with cameras can deliver visual intelligence while addressing the privacy concerns and adoption barriers that have limited smart glasses1
.The primary goal was demonstrating that this ear-worn form factor is even possible, according to Gollakota. "Traditionally, earbuds have been limited to audio interfaces," he says. "We show that we can indeed build a system within that form factor and get lots of intelligence by running visual language models"
1
. Three key challenges emerged: fitting cameras within strict size, power, and weight constraints; transmitting data efficiently; and creating a complete visual scene when worn in the ears1
.Power consumption represented the number one concern. The batteries in earbuds are about ten times smaller than those in smart glasses, making high-resolution cameras impractical
1
. The team used low-resolution cameras roughly the size of a grain of rice, capturing 324-by-324-pixel grayscale images that could be transmitted over Bluetooth rather than power-hungry Wi-Fi1
2
. This approach reduces battery usage while maintaining responsiveness.Placement posed another critical consideration. "One big question we had was: Will your face obscure the view too much? Can earbud cameras capture the user's view of the world reliably?" said lead author Maruchi Kim, a Ph.D. student in Gollakota's lab
3
. By angling each camera 5-10 degrees outward, the team achieved a field of view between 98 and 108 degrees3
. While this creates a small blind spot for objects closer than 20 centimeters directly in front of the user, people rarely hold things that close when examining them3
.The system stitches images from both earbuds into one frame, identifying overlapping imagery and combining it. This allows VueBuds to respond in one second rather than the two seconds required when processing separate images, making interactions feel real-time
3
.In user studies with 74 participants comparing VueBuds to Ray-Ban Meta glasses across 17 tasks, the AI-powered wearables performed equivalently despite VueBuds using low-resolution images with greater privacy controls
3
. Testing with the best performing on-device AI model, Qwen2.5-VL, VueBuds achieved approximately 82 percent accuracy for object recognition, 94 percent for character recognition, 84 percent for translation, and 87 percent overall accuracy1
. Separate trials with 16 participants showed VueBuds achieved 83-84% accuracy when using language translation to translate text and identify objects, and 93% when identifying book titles and authors3
.Participants preferred VueBuds for translation tasks, while Ray-Bans performed better at counting objects
2
.Related Stories
VueBuds takes a more subtle approach as an alternative to smart glasses, which have struggled with adoption due to privacy concerns and design limitations
2
. The system captures still images rather than continuous video, and all processing happens on the device using a small AI model, ensuring data doesn't need to be sent to the cloud2
3
. A visible indicator light turns on when recording, and users can immediately delete captured images3
."There's already a social paradigm for putting your earbuds away in their case," Kim says. "If you ever want to be confident that these cameras aren't recording, earbuds are a nice form factor that lets you just tuck it away when you're ready"
1
. Many AI features users want are also "episodic use cases" that don't require continuous video streams1
.The team plans to add color sensors to the system, which currently only captures grayscale images and cannot answer questions involving color
3
. Kim is exploring incorporating an on-device JPEG encoder to improve resolution by significantly reducing image file sizes1
. The researchers also want to train specialized visual language models for specific use cases, particularly aiding low-vision users or people who are blind when reading books, and assisting travelers with translation3
."Wearables are very personal," Kim notes. Some people may prefer glasses or watches, others might like rings, and there likely won't be one device to dominate. "We're just trying to introduce another category to demonstrate that everything smart glasses do can be achieved on [earbuds]"
1
. The research highlights a potential shift in how AI-powered wearables are designed, embedding visual intelligence into wearable technology people already use daily to avoid barriers faced by smart glasses2
.Summarized by
Navi
[2]
07 Jan 2026•Technology

10 May 2025•Technology

23 Jan 2025•Technology

1
Technology

2
Policy and Regulation

3
Policy and Regulation
