2 Sources
[1]
Apple's newest AI study unlocks street view for blind users - 9to5Mac
There's no shortage of rumors about Apple's plans to release camera-equipped wearables. And while it's easy to get fatigued by yet another wave of upcoming AI-powered hardware, one powerful use case often gets lost in the shuffle: accessibility. SceneScout, a new research prototype from Apple and Columbia University, isn't a wearable. Yet. But it hints at what AI could eventually unlock for blind and low-vision users. As Apple's and Columbia University's researchers explain it: People who are blind or have low vision (BLV) may hesitate to travel independently in unfamiliar environments due to uncertainty about the physical landscape. While most tools focus on in-situ navigation, those exploring pre-travel assistance typically provide only landmarks and turn-by-turn instructions, lacking detailed visual context. Street view imagery, which contains rich visual information and has the potential to reveal numerous environmental details, remains inaccessible to BLV people. To try to close this gap, the researchers present this project that combines Apple Maps APIs with a multimodal large language model to provide interactive, AI-generated descriptions of street view images. Instead of just relying on turn-by-turn directions or landmarks, users can explore an entire route or virtually explore a neighborhood block by block, with street-level descriptions that are tailored to their specific needs and preferences. The system supports two main modes: Behind the scenes, SceneScout grounds a GPT-4o-based agent within real-world map data and panoramic images from Apple Maps. It simulates a pedestrian's view, interprets what's visible, and outputs structured text, broken into short, medium, or long descriptions. The web interface, designed with screen readers in mind, presents all of this in a fully accessible format. The research team ran a study with 10 blind or low vision users, most of whom were proficient with screen readers and worked in tech. Participants used both Route Preview and Virtual Exploration, and gave the experience high marks for usefulness and relevance. The Virtual Exploration mode was especially praised, as many said it gave them access to information they would normally have to ask others about. Still, there were important shortcomings. While about 72% of the generated descriptions were accurate, some included subtle hallucinations, like claiming a crosswalk had audio signals when it didn't, or event mislabeling street signs. And while most of the information was stable over time, a few descriptions referenced outdated or transient details like construction zones or parked vehicles. Participants also pointed out that the system occasionally made assumptions, both about the user's physical abilities, and about the environment itself. Several users emphasized the need for more objective language, and better spatial precision, especially for last-meter navigation. Others wished the system could adapt more dynamically to their preferences over time, instead of relying on static keywords. SceneScout obviously isn't a shipping product, and it explores the collaboration between a multimodal large language model and the Apple Maps API, rather than real-time, computer vision-based in-site world navigation. But one could easily draw a line from one to the other. In fact, that is brought up towards the end of the study: Participants expressed a strong desire for real-time access to street view descriptions while walking. They envisioned applications that surface visual information through bone conduction headphones or transparency mode to provide relevant details as they move. As P9 put it, "Why can't [maps] have a built-in ability to help [provide] detailed information about what you're walking by." Participants suggested using even shorter, 'mini' (P1), descriptions while walking, highlighting only critical details such as landmarks or sidewalk conditions. More comprehensive descriptions, i.e. long descriptions, could be triggered on demand when users pause walking or reach intersections. Another participant (P4) suggested a new form of interaction, in which users "could point the device in a certain direction" to receive on-demand descriptions, rather than having to physically align their phone camera to capture the surroundings. This would enable users to actively survey their environment in real time, making navigation more dynamic and responsive. As with other studies published on arXiv, SceneScout: Towards AI Agent-driven Access to Street View Imagery for Blind Users hasn't been peer-reviewed. Still, it is absolutely worth your time if you'd like to know where AI, wearables, and computer vision are inevitably heading.
[2]
Apple researching AI agent that can describe Street View scenes to the blind
Visually impaired iPhone users may get more out of Look Around in the future Apple engineers have detailed an AI agent that accurately describes Street View scenes. If the research pans out, it could become a tool to help visually impaired people virtually explore a location in advance. Blind and visually impaired people already have tools at their disposal to navigate their devices and their local environment. However, Apple believes it could be beneficial for the same people to know about a place's physical features before visiting it. A paper released through Apple Machine Learning Research on Monday talks about SceneScout, a multi-modal large language model-driven AI agent. The key to the agent is that it can be used to view Street View imagery, analyze what is seen, and to describe it to the viewer. The paper is authored by Leah Findlater and Cole Gleason of Apple, as well as Gaurav Jain of Columbia University. It is explained that people with low vision may hesitate to travel independently in environments unfamiliar to them, since they don't know about the physical landscape they will encounter in advance. There are tools available to describe the local environment, such as Microsoft's Soundscape app from 2018. However, they are all designed to work in-situ, and not in advance. At the moment, pre-travel advice provides details like landmarks and turn-by-turn navigation, which do not provide much in the way of landscape context for visually impaired users. However, Street View style imagery, such as Apple Maps Look Around, often presents sighted users with a lot more contextual clues, which are often missed out on by people who cannot see it. This is where SceneScout steps in, as an AI agent to provide accessible interactions using Street View imagery. There are two modes to Scene Scout, with Route Preview providing details of elements it can observe on a route. For example, it could advise of trees at a turning and other more tactile elements to the user. A second mode, Virtual Exploration, is described as enabling free movement within Street View imagery, describing elements to the user as they virtually move. In its user study, the team determined that SceneScout is helpful to visually impaired people, in terms of uncovering information that they would not otherwise access using existing methods. When it comes to descriptions, the majority are deemed to be accurate, at 72% of the time, and can describe stable visual elements 95% of the time. However, occasional "subtle and plausible errors" make the descriptions difficult to verify without using sight. When it comes for ways to improve the system, the test participants proposed that SceneScout could provide personalized descriptions that adapt over multiple sessions. For example, the system could pick up on the types of information the user prefers to hear about. The shift of perspective for descriptions from the viewpoint of the camera on top of a car to where pedestrians would be normally located could also help improve the information. One other way to improve the system is also one that could be done in-situ. The participants said they would love for the Street View descriptions to be provided in real-time, to match where they are walking. The participants said this could be an application that provides the visual information through bone conduction headphones or a transparency mode as they move around. Furthermore, users may want to use a combination of a gyroscope and compass in a device to point in a general direction for environmental details, rather than hoping they line up a camera right for computer vision. Much like a patent filing, a paper detailing the use of AI in new ways does not guarantee that it will be available in a future product or service. However, it does provide a glimpse into applications Apple has considered for the technology. While not using Street View imagery, a similar approach could take advantage of a few rumored inbound Apple products. Apple is thought to be creating AirPods with built-in cameras, as well as Apple Glass smart glasses with its own cameras. In both cases, the cameras could give Apple Intelligence a view of the world, which then would be used to help answer queries for the user. It's not much of a stretch to imagine a similar system being used to describe the local environment to a user. All by using live data instead of potentially dated Street View images.
Share
Copy Link
Apple and Columbia University researchers develop SceneScout, an AI-powered system that provides detailed street view descriptions for blind and low-vision users, potentially revolutionizing independent travel and accessibility.
Apple, in collaboration with Columbia University, has unveiled a groundbreaking AI research prototype called SceneScout, aimed at enhancing street navigation for blind and low-vision (BLV) users. This innovative system combines Apple Maps APIs with multimodal large language models to provide interactive, AI-generated descriptions of street view images 12.
SceneScout addresses a critical need in the BLV community by offering detailed visual context for unfamiliar environments. Unlike existing tools that focus on in-situ navigation or provide limited pre-travel assistance, SceneScout taps into the rich visual information contained in street view imagery 1.
Source: AppleInsider
The system operates in two primary modes:
Behind the scenes, SceneScout utilizes a GPT-4-based agent grounded in real-world map data and panoramic images from Apple Maps. It simulates a pedestrian's view, interprets visible elements, and outputs structured text in short, medium, or long descriptions 1.
A study conducted with 10 BLV users, most of whom were tech-savvy and proficient with screen readers, yielded promising results:
Despite its potential, SceneScout faces several challenges:
Participants suggested several improvements:
Source: 9to5Mac
While SceneScout is currently a research prototype, it hints at exciting possibilities for AI-powered accessibility tools. The study suggests potential integration with rumored Apple products such as camera-equipped AirPods or Apple Glass smart glasses, which could provide real-time environmental descriptions using live data instead of static Street View images 2.
This research not only demonstrates Apple's commitment to accessibility but also showcases the potential of AI and computer vision to significantly improve the lives of visually impaired individuals. As these technologies continue to evolve, they promise to unlock new levels of independence and confidence for BLV users navigating the world around them.
Nvidia has made history by becoming the first company to reach a $4 trillion market capitalization, fueled by the ongoing AI revolution and its dominant position in the AI chip market.
61 Sources
Business and Economy
10 hrs ago
61 Sources
Business and Economy
10 hrs ago
OpenAI is set to release an AI-powered web browser in the coming weeks, aiming to revolutionize web browsing and compete directly with Google Chrome. This move could significantly impact the digital landscape and user data access.
17 Sources
Technology
10 hrs ago
17 Sources
Technology
10 hrs ago
Perplexity AI has launched Comet, an AI-powered web browser that integrates its search engine and AI assistant, aiming to revolutionize web browsing and compete with Google Chrome.
17 Sources
Technology
10 hrs ago
17 Sources
Technology
10 hrs ago
Google announces significant AI enhancements for Android devices, with Samsung's new foldable phones and smartwatches being the first to receive these upgrades. The updates include expanded Gemini integration, improved Circle to Search functionality, and AI-powered features for various Samsung apps.
10 Sources
Technology
10 hrs ago
10 Sources
Technology
10 hrs ago
Linda Yaccarino, CEO of X (formerly Twitter), resigns after a two-year tenure marked by efforts to revive the platform's ad business and navigate controversies under Elon Musk's ownership.
26 Sources
Business and Economy
10 hrs ago
26 Sources
Business and Economy
10 hrs ago