Curated by THEOUTPOST
On Wed, 14 May, 4:02 PM UTC
2 Sources
[1]
New Apple AI model generates 3D scenes from just three images - 9to5Mac
Apple's Machine Learning team, in collaboration with researchers from Nanjing University and The Hong Kong University of Science and Technology, has announced an interesting 3D AI model called Matrix3D. This so-called Large Photogrammetry Model is able to reconstruct 3D objects and scenes from just a few 2D photos, but with a big difference from current pipelines. Here's why this is a big deal. First things first: photogrammetry. It uses photographs to make measurements in order to create 3D models or maps. Currently, this process involves using different models for steps like pose estimation and depth prediction, which can lead to inefficiencies and errors. Matrix3D simplifies this by doing it all in one go. It takes in images, camera parameters (such as angle and focal length), and depth data, and processes them using a unified architecture. This not only simplifies the workflow but also improves accuracy. Even more interesting is how the model was trained. Researchers used a masked learning strategy, very similar to early Transformer-based AI systems that helped pave the way for the first versions of ChatGPT. They randomly hid parts of the input data during the training process, which forced Matrix3D to basically learn how to fill in the gaps. This technique is key because it enables Matrix3D to train effectively even with smaller or incomplete datasets. The results are impressive. With just three input images, Matrix3D can generate detailed 3D reconstructions of objects and even entire environments, which obviously could have very interesting applications for immersive headsets like the Apple Vision Pro.
[2]
Apple's New Matrix3D Model Can Turn Flat Images Into Dynamic 3D Scenes
The model was developed in partnership with Nanjing University and HKUST Apple researchers released a new artificial intelligence (AI) model that can generate 3D views from multiple 2D images. The large language model (LLM), dubbed Matrix3D, was developed by the company's Machine Learning team, in collaboration with Nanjing University and the Hong Kong University of Science and Technology (HKUST). The Cupertino-based tech giant has made the AI model available to the open community, and it can be downloaded via Apple's listing on GitHub. With Matrix3D, the researchers have unified the 3D generation pipeline to eliminate the risk of errors. In a post, the tech giant detailed the research that went into the development of the Matrix3D AI model. While several 3D rendering models already exist, this one innovates the existing space by unifying the pipeline to create 3D views. Instead of having multiple models and components, here, a single LLM performs several photogrammetry subtasks such as pose estimation, depth prediction, and novel view synthesis. Notably, Photogrammetry is the technique of obtaining accurate measurements and 3D information about physical objects and environments by analysing images. It is commonly used to create maps, 3D models, and measurements from 2D images taken from different angles. The researchers have also published a paper about the new model on the online preprint journal arXiv. As per the researches, Matrix3D is based on a multimodal diffusion transformer (DiT) architecture. It can integrate data across multiple modalities such as image data, camera parameters, and depth maps. In the paper, Apple researchers highlight that the model was trained using a mask learning strategy where a part of the image is obstructed, and the AI model is trained to find the right pixels that fit in the gap. The researchers found that the LLM can generate an entire 3D object or scene view with just three images from different angles. While the dataset used to train the model was not disclosed, the model itself is available to download, modify, and redistribute via a permissive Apple licence on the company's GitHub listing.
Share
Share
Copy Link
Apple's Machine Learning team, in collaboration with researchers from Nanjing University and HKUST, has developed Matrix3D, an innovative AI model that can generate detailed 3D scenes from just three 2D images.
Apple's Machine Learning team, in collaboration with researchers from Nanjing University and The Hong Kong University of Science and Technology, has introduced Matrix3D, a groundbreaking AI model that revolutionizes the process of generating 3D scenes from 2D images 12. This Large Photogrammetry Model represents a significant leap forward in the field of artificial intelligence and computer vision.
Matrix3D stands out from existing 3D rendering models by unifying the entire pipeline into a single process. Unlike current methods that rely on multiple models for various subtasks, Matrix3D performs pose estimation, depth prediction, and novel view synthesis all within a single large language model (LLM) 2. This unified approach not only streamlines the workflow but also enhances accuracy by eliminating potential errors that can occur when transitioning between different models 1.
The researchers employed a novel masked learning strategy to train Matrix3D, drawing inspiration from early Transformer-based AI systems that paved the way for models like ChatGPT. This technique involves randomly hiding parts of the input data during training, compelling the model to learn how to fill in the gaps 1. This approach enables Matrix3D to train effectively even with smaller or incomplete datasets, enhancing its versatility and robustness.
Matrix3D's ability to generate detailed 3D reconstructions of objects and entire environments from just three input images is particularly noteworthy 1. This capability could have far-reaching implications for various applications, including potential integration with immersive technologies like the Apple Vision Pro 1.
The model is based on a multimodal diffusion transformer (DiT) architecture, allowing it to integrate data across multiple modalities such as image data, camera parameters, and depth maps 2. This sophisticated architecture enables Matrix3D to process complex inputs and generate accurate 3D representations.
In a move that could accelerate further research and development in this field, Apple has made Matrix3D available to the open-source community. Researchers and developers can now download, modify, and redistribute the model via Apple's GitHub repository under a permissive license 2. This decision reflects Apple's commitment to fostering innovation and collaboration in the AI community.
While the full extent of Matrix3D's applications remains to be explored, its ability to generate 3D scenes from minimal input could have significant implications for various industries. From augmented reality and virtual reality to urban planning and digital twin technology, the potential use cases for this technology are vast and exciting.
Reference
[2]
Apple's Machine Learning Research team has developed Depth Pro, an AI model that can create detailed 3D depth maps from single 2D images in less than a second, potentially revolutionizing AR, robotics, and image processing.
6 Sources
6 Sources
World Labs, led by AI pioneer Fei-Fei Li, has introduced an innovative AI system that transforms 2D images into explorable 3D environments, potentially revolutionizing content creation for games, movies, and virtual experiences.
6 Sources
6 Sources
Tencent unveils Hunyuan3D 2.0, an open-source AI system that rapidly converts 2D images or text descriptions into detailed 3D models, potentially transforming industries from game development to e-commerce.
2 Sources
2 Sources
Stability AI introduces Stable Video 4D, a groundbreaking AI model capable of generating 3D videos from text prompts. This innovation marks a significant advancement in the field of AI-generated content, offering new possibilities for creators and industries.
3 Sources
3 Sources
Roblox launches Cube 3D, an open-source AI model for creating 3D objects from text prompts, with plans for future expansion into '4D creation' and multimodal capabilities.
4 Sources
4 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved