Researchers have introduced LiTo, a novel 3D latent representation that simultaneously captures object geometry and view-dependent appearance, addressing a long-standing limitation in prior works that focused on either 3D geometry reconstruction or view-independent diffuse appearance prediction. By exploiting RGB-depth images as surface light field samples, LiTo's approach enables the encoding of random subsamples, effectively modeling realistic view-dependent effects. This innovation has significant implications for various applications, including computer vision and graphics. The ability to jointly model geometry and appearance can lead to more realistic and detailed representations of 3D scenes. This matters to practitioners because it can enhance the accuracy and efficiency of tasks such as 3D reconstruction, object recognition, and scene understanding, ultimately improving the overall performance of AI systems that rely on these capabilities1.