2 Sources
2 Sources
[1]
New Apple model recreates 3D objects with realistic lighting effects - 9to5Mac
Apple researchers have created an IA model that reconstructs a 3D object from a single image, while keeping reflections, highlights, and other effects consistent across different viewing angles. Here are the details. While the concept of latent space in machine learning is not exactly new, it has become more popular than ever in recent years, with the explosion of AI models based on the transformer architecture and, more recently, world models. In a nutshell (and running the risk of being slightly imprecise to explain the bigger picture), "latent space," or "embedding space," are terms that describe what happens when you: If that still sounds too abstract, one classic example is to get the mathematical representation of the token "king", subtract the mathematical representation of the token "man", add the mathematical representation of the token "woman", and you will end up in the general multi-dimensional region of the token "queen". In practical terms, storing information as mathematical representations in latent space makes it faster and less computationally expensive to measure distances between them and estimate the probability of what should be generated. Here's a short video that explains latent space using a different analogy: Although the examples above focus on storing text in latent space, the same idea can be applied to many other types of data. Which brings us to Apple's study. In Apple's new study, titled LiTo: Surface Light Field Tokenization, the researchers "propose a 3D latent representation that jointly models object geometry and view-dependent appearance." In other words, they created a way to represent, in latent space, not only how to reconstruct a three-dimensional object, but also how light interacting with it should appear from different angles. As they explain it: Most prior works focus on either reconstructing 3D geometry or predicting view-independent diffuse appearance, and thus struggle to capture realistic view-dependent effects. Our approach leverages that RGB-depth images provide samples of a surface light field. By encoding random subsamples of this surface light field into a compact set of latent vectors, our model learns to represent both geometry and appearance within a unified 3D latent space. This representation reproduces view-dependent effects such as specular highlights and Fresnel reflections under complex lighting. What's more, the researchers managed to train the model so it can do all of that from a single image, rather than the more common methods that require images from different angles to enable 3D reconstruction. While the entire method is highly technical and is explained in detail in the study, the core idea is actually relatively simple, once you understand how latent space works: To train the model, the researchers selected thousands of objects rendered from 150 different viewing angles, and 3 lighting conditions. Then, instead of feeding all of that information directly into the model, the system randomly selected small subsets of these samples and compressed them into a latent representation. Next, the decoder was trained to reconstruct the full object and its appearance under different angles and light conditions, from just that subset of the data. Over the course of training, the system learned a latent representation that captured both the object's geometry, and how its appearance changes depending on the viewing direction. Once that was done, they trained yet another model that takes a single image of an object and predicts the latent representation that corresponds to it. Next, the decoder reconstructs the full 3D object, including how its appearance changes as the viewing angle varies. Here are a few reconstruction comparisons between LiTo and a model called TRELLIS, as Apple published on the project page: Be sure to check out the project page, where you can also load side-by-side interactive comparisons between LiTo and TRELLIS, as seen in the featured image for this post.
[2]
Apple's New LiTo AI Turns Photos into Hyperreal 3D Objects: Here's How it Works
Apple has recently launched LiTo, a new AI model that can reconstruct 3D objects from an image while accurately preserving lighting effects like reflections and highlights. The end product is realistic and better than previous techniques. The model transforms visual information into numerical data to understand both an object's shape and how light interacts with it. This concept is called latent space. The process involves two important steps. First, an encoder compresses the image into a compact representation, and then a decoder reconstructs it as a 3D object. The model adds details such as shadows, reflections, and lighting changes throughout the process.
Share
Share
Copy Link
Apple researchers have introduced LiTo, an AI model that reconstructs 3D objects from a single image while preserving realistic lighting effects across different viewing angles. The system uses latent space to jointly model object geometry and view-dependent appearance, capturing specular highlights and Fresnel reflections that previous methods struggled to achieve.
Apple researchers have developed LiTo AI, a machine learning model that reconstructs 3D objects from a single image while maintaining realistic lighting effects across multiple viewing angles
1
. The model, detailed in a study titled "LiTo: Surface Light Field Tokenization," addresses a significant limitation in existing 3D reconstruction approaches by capturing view-dependent effects such as specular highlights and Fresnel reflections1
. Unlike previous methods that focus on either reconstructing geometry or predicting view-independent diffuse appearance, LiTo jointly models both aspects to create hyperreal 3D objects2
.
Source: Analytics Insight
The system relies on a 3D latent representation that transforms visual information into numerical data within latent space, enabling the model to understand both an object's shape and how light interacts with its surface
2
. Apple researchers explain that the approach leverages RGB-depth images to provide samples of a surface light field, encoding random subsamples into a compact set of latent vectors1
. This unified representation allows the model to reproduce complex lighting interactions under various conditions, a capability that sets it apart from conventional reconstruction methods.The training methodology involves two critical components: an encoder that compresses images into compact representations, and a decoder that reconstructs them as three-dimensional forms
2
. To train the model, Apple researchers selected thousands of objects rendered from 150 different viewing angles under 3 lighting conditions1
. Rather than feeding all this information directly, the system randomly selected small subsets of samples and compressed them into latent representations, teaching the decoder to reconstruct full objects from limited data.Related Stories
Comparisons published on the project page demonstrate LiTo's superior performance against TRELLIS, another 3D reconstruction model
1
. The model adds intricate details such as shadows, reflections, and lighting changes throughout the reconstruction process2
. What distinguishes LiTo is its ability to reconstruct a 3D object from a single image, eliminating the need for multiple angles that more common methods require . This efficiency could accelerate workflows in industries ranging from e-commerce product visualization to gaming and augmented reality applications.
Source: 9to5Mac
The development signals Apple's continued investment in spatial computing and 3D content creation capabilities. With the company's focus on Vision Pro and augmented reality experiences, LiTo AI could enable users to quickly generate realistic 3D assets for immersive environments. The technology might integrate into iOS camera features, allowing consumers to capture physical objects and instantly convert them into digital models with accurate material properties. As Apple researchers refine the model's capabilities, watch for potential applications in professional creative tools, retail experiences, and developer frameworks that could reshape how digital content gets created from real-world sources.
Summarized by
Navi
[2]
14 May 2025•Technology

18 Dec 2025•Technology

07 Oct 2024•Technology

1
Technology

2
Technology

3
Policy and Regulation
