2 Sources
[1]
New Apple AI model generates 3D scenes from just three images - 9to5Mac
Apple's Machine Learning team, in collaboration with researchers from Nanjing University and The Hong Kong University of Science and Technology, has announced an interesting 3D AI model called Matrix3D. This so-called Large Photogrammetry Model is able to reconstruct 3D objects and scenes from just a few 2D photos, but with a big difference from current pipelines. Here's why this is a big deal. First things first: photogrammetry. It uses photographs to make measurements in order to create 3D models or maps. Currently, this process involves using different models for steps like pose estimation and depth prediction, which can lead to inefficiencies and errors. Matrix3D simplifies this by doing it all in one go. It takes in images, camera parameters (such as angle and focal length), and depth data, and processes them using a unified architecture. This not only simplifies the workflow but also improves accuracy. Even more interesting is how the model was trained. Researchers used a masked learning strategy, very similar to early Transformer-based AI systems that helped pave the way for the first versions of ChatGPT. They randomly hid parts of the input data during the training process, which forced Matrix3D to basically learn how to fill in the gaps. This technique is key because it enables Matrix3D to train effectively even with smaller or incomplete datasets. The results are impressive. With just three input images, Matrix3D can generate detailed 3D reconstructions of objects and even entire environments, which obviously could have very interesting applications for immersive headsets like the Apple Vision Pro.
[2]
Apple's New Matrix3D Model Can Turn Flat Images Into Dynamic 3D Scenes
The model was developed in partnership with Nanjing University and HKUST Apple researchers released a new artificial intelligence (AI) model that can generate 3D views from multiple 2D images. The large language model (LLM), dubbed Matrix3D, was developed by the company's Machine Learning team, in collaboration with Nanjing University and the Hong Kong University of Science and Technology (HKUST). The Cupertino-based tech giant has made the AI model available to the open community, and it can be downloaded via Apple's listing on GitHub. With Matrix3D, the researchers have unified the 3D generation pipeline to eliminate the risk of errors. In a post, the tech giant detailed the research that went into the development of the Matrix3D AI model. While several 3D rendering models already exist, this one innovates the existing space by unifying the pipeline to create 3D views. Instead of having multiple models and components, here, a single LLM performs several photogrammetry subtasks such as pose estimation, depth prediction, and novel view synthesis. Notably, Photogrammetry is the technique of obtaining accurate measurements and 3D information about physical objects and environments by analysing images. It is commonly used to create maps, 3D models, and measurements from 2D images taken from different angles. The researchers have also published a paper about the new model on the online preprint journal arXiv. As per the researches, Matrix3D is based on a multimodal diffusion transformer (DiT) architecture. It can integrate data across multiple modalities such as image data, camera parameters, and depth maps. In the paper, Apple researchers highlight that the model was trained using a mask learning strategy where a part of the image is obstructed, and the AI model is trained to find the right pixels that fit in the gap. The researchers found that the LLM can generate an entire 3D object or scene view with just three images from different angles. While the dataset used to train the model was not disclosed, the model itself is available to download, modify, and redistribute via a permissive Apple licence on the company's GitHub listing.
Share
Copy Link
Apple's Machine Learning team, in collaboration with researchers from Nanjing University and HKUST, has developed Matrix3D, an innovative AI model that can generate detailed 3D scenes from just three 2D images.
Apple's Machine Learning team, in collaboration with researchers from Nanjing University and The Hong Kong University of Science and Technology, has introduced Matrix3D, a groundbreaking AI model that revolutionizes the process of generating 3D scenes from 2D images 12. This Large Photogrammetry Model represents a significant leap forward in the field of artificial intelligence and computer vision.
Matrix3D stands out from existing 3D rendering models by unifying the entire pipeline into a single process. Unlike current methods that rely on multiple models for various subtasks, Matrix3D performs pose estimation, depth prediction, and novel view synthesis all within a single large language model (LLM) 2. This unified approach not only streamlines the workflow but also enhances accuracy by eliminating potential errors that can occur when transitioning between different models 1.
The researchers employed a novel masked learning strategy to train Matrix3D, drawing inspiration from early Transformer-based AI systems that paved the way for models like ChatGPT. This technique involves randomly hiding parts of the input data during training, compelling the model to learn how to fill in the gaps 1. This approach enables Matrix3D to train effectively even with smaller or incomplete datasets, enhancing its versatility and robustness.
Matrix3D's ability to generate detailed 3D reconstructions of objects and entire environments from just three input images is particularly noteworthy 1. This capability could have far-reaching implications for various applications, including potential integration with immersive technologies like the Apple Vision Pro 1.
The model is based on a multimodal diffusion transformer (DiT) architecture, allowing it to integrate data across multiple modalities such as image data, camera parameters, and depth maps 2. This sophisticated architecture enables Matrix3D to process complex inputs and generate accurate 3D representations.
In a move that could accelerate further research and development in this field, Apple has made Matrix3D available to the open-source community. Researchers and developers can now download, modify, and redistribute the model via Apple's GitHub repository under a permissive license 2. This decision reflects Apple's commitment to fostering innovation and collaboration in the AI community.
While the full extent of Matrix3D's applications remains to be explored, its ability to generate 3D scenes from minimal input could have significant implications for various industries. From augmented reality and virtual reality to urban planning and digital twin technology, the potential use cases for this technology are vast and exciting.
Summarized by
Navi
[2]
Google launches its new Pixel 10 smartphone series, showcasing advanced AI capabilities powered by Gemini, aiming to challenge competitors in the premium handset market.
20 Sources
Technology
7 hrs ago
20 Sources
Technology
7 hrs ago
Google's Pixel 10 series introduces groundbreaking AI features, including Magic Cue, Camera Coach, and Voice Translate, powered by the new Tensor G5 chip and Gemini Nano model.
12 Sources
Technology
8 hrs ago
12 Sources
Technology
8 hrs ago
NASA and IBM have developed Surya, an open-source AI model that can predict solar flares and space weather with improved accuracy, potentially helping to protect Earth's infrastructure from solar storm damage.
6 Sources
Technology
15 hrs ago
6 Sources
Technology
15 hrs ago
Google's latest smartwatch, the Pixel Watch 4, introduces significant upgrades including a curved display, enhanced AI features, and improved health tracking capabilities.
17 Sources
Technology
7 hrs ago
17 Sources
Technology
7 hrs ago
FieldAI, a robotics startup, has raised $405 million to develop "foundational embodied AI models" for various robot types. The company's innovative approach integrates physics principles into AI, enabling safer and more adaptable robot operations across diverse environments.
7 Sources
Technology
7 hrs ago
7 Sources
Technology
7 hrs ago