3 Sources
3 Sources
[1]
New Apple model recreates 3D objects with realistic lighting effects - 9to5Mac
Apple researchers have created an IA model that reconstructs a 3D object from a single image, while keeping reflections, highlights, and other effects consistent across different viewing angles. Here are the details. While the concept of latent space in machine learning is not exactly new, it has become more popular than ever in recent years, with the explosion of AI models based on the transformer architecture and, more recently, world models. In a nutshell (and running the risk of being slightly imprecise to explain the bigger picture), "latent space," or "embedding space," are terms that describe what happens when you: If that still sounds too abstract, one classic example is to get the mathematical representation of the token "king", subtract the mathematical representation of the token "man", add the mathematical representation of the token "woman", and you will end up in the general multi-dimensional region of the token "queen". In practical terms, storing information as mathematical representations in latent space makes it faster and less computationally expensive to measure distances between them and estimate the probability of what should be generated. Here's a short video that explains latent space using a different analogy: Although the examples above focus on storing text in latent space, the same idea can be applied to many other types of data. Which brings us to Apple's study. In Apple's new study, titled LiTo: Surface Light Field Tokenization, the researchers "propose a 3D latent representation that jointly models object geometry and view-dependent appearance." In other words, they created a way to represent, in latent space, not only how to reconstruct a three-dimensional object, but also how light interacting with it should appear from different angles. As they explain it: Most prior works focus on either reconstructing 3D geometry or predicting view-independent diffuse appearance, and thus struggle to capture realistic view-dependent effects. Our approach leverages that RGB-depth images provide samples of a surface light field. By encoding random subsamples of this surface light field into a compact set of latent vectors, our model learns to represent both geometry and appearance within a unified 3D latent space. This representation reproduces view-dependent effects such as specular highlights and Fresnel reflections under complex lighting. What's more, the researchers managed to train the model so it can do all of that from a single image, rather than the more common methods that require images from different angles to enable 3D reconstruction. While the entire method is highly technical and is explained in detail in the study, the core idea is actually relatively simple, once you understand how latent space works: To train the model, the researchers selected thousands of objects rendered from 150 different viewing angles, and 3 lighting conditions. Then, instead of feeding all of that information directly into the model, the system randomly selected small subsets of these samples and compressed them into a latent representation. Next, the decoder was trained to reconstruct the full object and its appearance under different angles and light conditions, from just that subset of the data. Over the course of training, the system learned a latent representation that captured both the object's geometry, and how its appearance changes depending on the viewing direction. Once that was done, they trained yet another model that takes a single image of an object and predicts the latent representation that corresponds to it. Next, the decoder reconstructs the full 3D object, including how its appearance changes as the viewing angle varies. Here are a few reconstruction comparisons between LiTo and a model called TRELLIS, as Apple published on the project page: Be sure to check out the project page, where you can also load side-by-side interactive comparisons between LiTo and TRELLIS, as seen in the featured image for this post.
[2]
Apple can create 3D objects with realistic lighting effects from a single image with their new AI model
9to5mac.com can now tell us something interesting. Apple's researchers have created an AI model that reconstructs a 3D object from a single image, while "keeping reflections, highlights, and other effects consistent across different viewing angles". In Apple's new study, titled LiTo: Surface Light Field Tokenization, the researchers "propose a 3D latent representation that jointly models object geometry and view-dependent appearance". In other words, Apple has created a way to reconstruct a three-dimensional object, and also how light interacting with it should appear from different angles. The researchers also managed to train the model so it can do all of that by using a single image instead of using more common methods that require images from different angles to enable 3D reconstruction. All of this required quite a lot of training for the model, as expected. The actual process is very technical and quite demanding, so anyone interested should read more about the topic right here.
[3]
Apple's New LiTo AI Turns Photos into Hyperreal 3D Objects: Here's How it Works
Apple has recently launched LiTo, a new AI model that can reconstruct 3D objects from an image while accurately preserving lighting effects like reflections and highlights. The end product is realistic and better than previous techniques. The model transforms visual information into numerical data to understand both an object's shape and how light interacts with it. This concept is called latent space. The process involves two important steps. First, an encoder compresses the image into a compact representation, and then a decoder reconstructs it as a 3D object. The model adds details such as shadows, reflections, and lighting changes throughout the process.
Share
Share
Copy Link
Apple researchers have developed LiTo, an AI model that reconstructs 3D objects from a single image while preserving realistic lighting effects like reflections and highlights across different viewing angles. The Surface Light Field Tokenization approach uses latent space to jointly model object geometry and view-dependent appearance, outperforming existing methods that typically require multiple images.
Apple researchers have developed a groundbreaking AI model called LiTo that reconstructs hyperreal 3D objects from a single image while maintaining realistic lighting effects across different viewing angles
1
. The study, titled Surface Light Field Tokenization, introduces a novel approach that jointly models object geometry and view-dependent appearance within a unified framework. Unlike most prior works that focus on either reconstructing 3D geometry or predicting view-independent diffuse appearance, LiTo captures complex visual phenomena including specular highlights and Fresnel reflections under varying lighting conditions1
.
Source: Analytics Insight
The machine learning model achieves this feat by leveraging latent space, a mathematical representation that stores information about both an object's physical structure and how light interacts with its surface
3
. The process involves an encoder-decoder architecture where an encoder first compresses the image into a compact representation, then a decoder reconstructs it as a 3D object complete with shadows, reflections, and lighting changes3
. What distinguishes this approach is its ability to generate 3D objects from a single image, eliminating the need for more common methods that require images from different angles to enable 3D reconstruction2
.To train the AI model, Apple researchers selected thousands of objects rendered from 150 different viewing angles and 3 lighting conditions
1
. Rather than feeding all this information directly into the system, they randomly selected small subsets of these samples and compressed them into a latent representation. The decoder was then trained to reconstruct the full object and its appearance under different angles and light conditions from just that subset of data1
. Through this training process, the system learned to capture both the object's geometry and how its appearance changes depending on viewing direction. Subsequently, another model was trained to take a single image of an object and predict the corresponding latent representation, enabling the decoder to reconstruct the full 3D object with view-dependent effects1
.Related Stories
Apple published reconstruction comparisons between LiTo and an existing model called TRELLIS on the project page, demonstrating superior performance in capturing realistic lighting effects
1
. The ability to reconstruct 3D objects from a single image with accurate reflections, highlights, and other effects consistent across different viewing angles represents a significant advancement in computer vision and 3D modeling2
. This technology could have wide-ranging applications in augmented reality, product visualization, e-commerce, and digital content creation, particularly as Apple continues to develop its Vision Pro spatial computing platform. The research demonstrates how leveraging surface light field samples through RGB-depth images enables more accurate representation of complex lighting interactions, potentially setting a new standard for single-image 3D reconstruction methods.
Source: 9to5Mac
Summarized by
Navi
[2]
[3]
14 May 2025•Technology

18 Dec 2025•Technology

07 Oct 2024•Technology
