Curated by THEOUTPOST
On Mon, 7 Oct, 4:04 PM UTC
6 Sources
[1]
Apple's Depth Pro model 3D maps 2D images in a fraction of a second
"Depth Pro synthesizes high-resolution depth maps with unparalleled sharpness and high-frequency details" Apple's Machine Learning Research wing has developed a foundational AI model "for zero-shot metric monocular depth estimation." Depth Pro enables high-speed generation of detailed 3D depth maps from a single two-dimensional image. Our brains process visual information from two image sources - our eyes. Each has a slightly different view of the world, and these are combined into a single stereo image, with the differences also helping us to gauge how close or far objects are. Many cameras and smartphones look at life through a single lens, but three dimensional depth maps can be created using information hidden in metadata of 2D photos (such as focal lengths and sensor info) or estimated using multiple images. The Depth Pro system doesn't bother with all that though, yet is able to generate a detailed 3D depth map at 2.25 megapixels from a single image in 0.3 seconds via a standard graphics processing unit. The AI model's architecture includes something called a multi-scale vision transformer to simultaneously process the overall context of an image as well as all the finer details like "hair, fur, and other fine structures." And it's able to estimate both relative and absolute depth, meaning that the model can furnish real-world measurements to allow, for example, augmented reality apps to precisely position virtual objects in a physical space. The AI is able to do all this without needing resource-intensive training on very specific datasets, employing something called zero-shot learning - which IBM describes as "a machine learning scenario in which an AI model can recognize and categorize unseen classes without labeled examples." This makes for quite a versatile beast. As for applications, beyond the AR scenario mentioned above, Depth Pro could make for much more efficient photo editing or even lead to real-time 3D imagery using a single-lens camera, and prove useful for helping machines like autonomous vehicles and robots to better perceive the world around them in real-time. The project is still at the research stage, but perhaps unusually for Apple, the code and supporting documentation are being made available as open source on GitHub, allowing developers, scientists and coders to take the technology to the next level
[2]
Apple unveils Depth Pro, an AI app that can map the depth of a 2D image
A team of engineers at Apple has developed an AI-based model called Depth Pro that can map the depth of a 2D image. The team has written a paper describing the app and its capabilities and has posted it on the arXiv preprint server. They have also posted an announcement regarding the app on the company's Machine Learning Research page. Humans and other animals are able to perceive depth because the brain is able to take two images, one from each eye, and use the differences between them to figure out which parts of the images are closer and which are more distant. Some video cameras have done something similar to create 3D videos. Smartphones, because they rely on just one camera for picture taking and video creation, have various hardware and software additions that allow for adding some degree of depth. In this new effort, the engineers at Apple have created an entire depth map using data from the original image without resorting to use of metadata such as camera intrinsics. A depth map is a map that is created using all the pixels in an original image. Each data-point on the map represents a single pixel and corresponds to the distance between the part of the picture represented by the pixel and the corresponding part of the object that was imaged. Such a map allows for the addition of another dimension to a flat picture, giving it 3D effects. Creating a depth map, the team suggests, can generate 3D effects that are sharper than those made using standard smartphone techniques. In their announcement, the team at Apple claims that apps using the model are capable of producing a depth map in just 0.3 seconds when run on a computer with a standard GPU -- and it can do so without the types of camera data that are usually needed to generate 3D effects. By creating a model that operates so speedily, Apple has opened the door to creating 3D imagery from a single lens camera in real time. And this, the team notes, could have major implications for robots and other real-time mapping applications, such as those used on autonomous vehicles.
[3]
Apple's New AI Model Creates 3D Depth Maps From 2D Images in Less Than a Second
Apple's Machine Learning Research team created a new AI model that promises significant improvements concerning computer vision models and how they analyze three-dimensional within a two-dimensional image. The new AI model, as reported by VentureBeat, is called Depth Pro and is detailed in a new paper, "Depth Pro: Sharp Monocular Metric Depth in Less Than a Second." Depth Pro promises to create sophisticated 3D depth maps from individual 2D images quickly. The paper's abstract explains that the model can make a 2.25-megapixel depth map from an image in 0.3 seconds using a consumer-grade GPU. Although devices like Apple's latest iPhones can create depth maps using on-device sensors, most still images have no accompanying real-world depth data. However, depth maps for these images can be highly beneficial for numerous applications, including during routine image editing. For example, if someone wants to edit only a subject or introduce an artificial "lens" blur to a scene, a depth map can help software create precise masks. A depth map model can also help with AI image generation, as a deep understanding of depth maps can help a synthesis model produce more realistic results. As the Apple researchers -- Aleksei Bochkovskii, Amaël Delaunoy, Hugo Germain, Marcel Santos, Yichao Zhou, Stephan R. Richter, and Vladlen Koltun -- explain, an effective zero-shot metric monocular depth estimation model must swiftly produce accurate, high-resolution results to be helpful. A sloppy depth map is of little value. "Depth Pro produces high-resolution metric depth maps with high-frequency detail at sub-second runtimes. Our model achieves state-of-the-art zero-shot metric depth estimation accuracy without requiring metadata such as camera intrinsics and traces out occlusion boundaries in unprecedented detail, facilitating applications such as novel view synthesis from single images 'in the wild,'" Apple researchers explain. However, the team acknowledges some limitations, including trouble dealing with translucent surfaces and volumetric scattering. As VentureBeat explains, beyond photo editing and novel synthesis applications, a depth map model could also prove useful for augmented reality (AR) applications, wherein virtual objects must be accurately placed within physical space. The Depth Pro model is adept with both relative and absolute depth, which is vital for many use cases. People can test Depth Pro for themselves on Hugging Face and learn much more about the inner workings of the depth model by reading Apple's new research paper.
[4]
Apple's new Depth Pro AI could revolutionise AR -- capturing 3D space from a single image in just seconds
AI image to illustrate potential future smart glasses (Image credit: Ideogram 2/Future AI) Not a week goes by without something new in AI development pushing the technology forward, but this week's comes from a small tech company in Cupertino. While all eyes are on Apple Intelligence and its eventual release which will bring context-specific AI features to everyday use, the company has also shown off a new AI model called Depth Pro. As the name suggests, this new artificial intelligence model will map the depth of an image in real time. Where it is more exciting is in the fact it can do this on standard home computing hardware -- no Nvidia H100's required. Depth Pro is a research model, not something Apple is necessarily putting into production but if we ever get a pair of Apple Glasses, it would certainly help the company make augmented reality work better, or even improve the AR functionality of the Vision Pro. Apple's new model estimates relative and absolute depth, using them to produce "metric depth". This data can then be used, along with the image in a range of ways. When a user takes a picture, Depth Pro draws accurate measurements between items in the image. Apple's model should also avoid inconsistencies like thinking the sky is part of the background, or misjudging the foreground and background of a shot. The potential, Terminator 2 aside, is almost endless. Autonomous cars (ironically like Apple's canceled offering), drones, and robot vacuums could use accurate depth sensing to help improve object avoidance, while Augmented Reality tech and online furniture stores could help more accurately place items around a room -- real or virtual. Medical tech could be improved with depth perception, too, improving reconstruction of anatomical structures and mapping of internal organs. It could go full circle, too, more accurately helping shift images to video using generative AI like Luma Dream Machine. This would work by passing the depth data to the video model along with the image to give it a better understanding of how to handle object placement and motion in that space.
[5]
Depth Pro: Apple's New Open Source Monocular Depth Estimation AI Model
This model will generate monocular depth maps from images, advancing applications in 3D textures and augmented reality (AR) In a move that speers its commitment to advancing artificial intelligence, Apple has released a new open-source AI model called Depth Pro. This vision model specializes in generating monocular depth maps from images. This is an advancement for applications in 3D textures, augmented reality (AR) and various other technologies. This release adds to Apple's growing list of open-source AI models launched this year. This consists of smaller language models customized for specific tasks. However, Depth Pro stands out due to its specialized capability to analyze single images and derive depth information, a process traditionally reliant on multi-camera setups.
[6]
Apple Releases an Open-Source Monocular Depth Estimation AI Model
The Depth Pro model can synthesise depth maps of thin structures Apple has released several open-source artificial intelligence (AI) models this year. These are mostly small language models designed for a specific task. Adding to the list, the Cupertino-based tech giant has now released a new AI model dubbed Depth Pro. It is a vision model that can generate monocular depth maps of any image. This technology is useful in the generation of 3D textures, augmented reality (AR), and more. The researchers behind the project claim that the depth maps generated by AI are better than the ones generated with the help of multiple cameras. Depth estimation is an important process in 3D modelling as well as various other technologies such as AR, autonomous driving systems, robotics, and more. The human eye is a complex lens system that can accurately gauge the depth of objects even while observing them from a single-point perspective. However, cameras are not that good at it. Images taken with a single camera make it appear two-dimensional, removing depth from the equation. So, for technologies where the depth of an object plays an important role, multiple cameras are used. However, modelling objects like this can be time-consuming and resource-intensive. Instead, in a research paper titled "Depth Pro: Sharp Monocular Metric Depth in Less Than a Second", Apple highlighted how it used a vision-based AI model to generate zero-shot depth maps of monocular images of objects. To develop the AI model, the researchers used the Vision Transformer-based (ViT) architecture. The output resolution of 384 x 384 was picked, but the input and processing resolution was kept at 1536 x 1536, allowing the AI model more space to understand the details. In the pre-print version of the paper, which is currently published in the online journal arXiv, the researchers claimed that the AI model can now accurately generate depth maps of visually complex objects such as a cage, a furry cat's body and whiskers, and more. The generation time is said to be one second. The weights of the open-source AI model are currently being hosted on a GitHub listing. Interested individuals can run the model on the inference of a single GPU.
Share
Share
Copy Link
Apple's Machine Learning Research team has developed Depth Pro, an AI model that can create detailed 3D depth maps from single 2D images in less than a second, potentially revolutionizing AR, robotics, and image processing.
Apple's Machine Learning Research team has unveiled a revolutionary AI model called Depth Pro, capable of generating high-resolution 3D depth maps from single 2D images in a fraction of a second 12. This breakthrough technology promises to transform various fields, including augmented reality (AR), robotics, and image processing.
Depth Pro can create a detailed 2.25-megapixel depth map from a single image in just 0.3 seconds using a standard GPU 13. The model employs a multi-scale vision transformer to simultaneously process the overall context of an image and its finer details, such as hair and fur 1. This approach allows Depth Pro to estimate both relative and absolute depth, providing real-world measurements for precise positioning of virtual objects in physical spaces 14.
One of Depth Pro's key features is its use of zero-shot learning, which enables the AI to recognize and categorize unseen classes without labeled examples 1. This versatility makes the model highly adaptable to various scenarios without requiring resource-intensive training on specific datasets.
The applications for Depth Pro are wide-ranging and potentially transformative:
Depth Pro overcomes limitations of traditional depth mapping techniques by not relying on metadata such as camera intrinsics or multiple images 23. The model's architecture allows it to trace out occlusion boundaries with unprecedented detail, facilitating applications like novel view synthesis from single images "in the wild" 3.
In an unusual move for Apple, the company has made Depth Pro's code and supporting documentation available as open source on GitHub 15. This decision allows developers, scientists, and coders to further explore and enhance the technology, potentially accelerating its integration into various applications.
While Depth Pro represents a significant advancement, the researchers acknowledge some limitations, including difficulties in handling translucent surfaces and volumetric scattering 3. As a research model, Depth Pro is not yet in production, but its potential applications in future Apple products, such as AR glasses or improvements to the Vision Pro, are evident 4.
As AI continues to evolve rapidly, Depth Pro stands out as a notable achievement in computer vision, promising to reshape how we interact with and manipulate visual data in both digital and physical realms.
Reference
[4]
[5]
Apple is preparing to release a significant update for the Vision Pro in April, introducing AI-driven features to improve user interaction, personalization, and functionality. The update aims to make the device more intuitive and useful for work, entertainment, and daily tasks.
3 Sources
3 Sources
Apple is set to introduce Visual Intelligence, a powerful AI-driven feature for the iPhone 16. This technology aims to revolutionize how users interact with images and the world around them, rivaling Google Lens.
6 Sources
6 Sources
Apple rolls out visionOS 2.4 for Vision Pro, introducing Apple Intelligence features, new spatial experiences, and enhanced functionality. The update brings AI-powered tools for writing, image generation, and photo management to the mixed reality headset.
6 Sources
6 Sources
World Labs, led by AI pioneer Fei-Fei Li, has introduced an innovative AI system that transforms 2D images into explorable 3D environments, potentially revolutionizing content creation for games, movies, and virtual experiences.
6 Sources
6 Sources
Apple announces plans to utilize Apple Maps Look Around imagery for training AI models, starting March 2025. The move aims to enhance various Apple products and services while maintaining user privacy.
5 Sources
5 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved