[2604.03296] 3D-IDE: 3D Implicit Depth Emergent
About this article
Abstract page for arXiv paper 2604.03296: 3D-IDE: 3D Implicit Depth Emergent
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.03296 (cs) [Submitted on 28 Mar 2026] Title:3D-IDE: 3D Implicit Depth Emergent Authors:Chushan Zhang, Ruihan Lu, Jinguang Tong, Yikai Wang, Hongdong Li View a PDF of the paper titled 3D-IDE: 3D Implicit Depth Emergent, by Chushan Zhang and 4 other authors View PDF HTML (experimental) Abstract:Leveraging 3D information within Multimodal Large Language Models (MLLMs) has recently shown significant advantages for indoor scene understanding. However, existing methods, including those using explicit ground-truth 3D positional encoding and those grafting external 3D foundation models for implicit geometry, struggle with the trade-off in 2D-3D representation fusion, leading to suboptimal deployment. To this end, we propose 3D-Implicit Depth Emergence, a method that reframes 3D perception as an emergent property derived from geometric self-supervision rather than explicit encoding. Our core insight is the Implicit Geometric Emergence Principle: by strategically leveraging privileged geometric supervision through mechanisms like a fine-grained geometry validator and global representation constraints, we construct an information bottleneck. This bottleneck forces the model to maximize the mutual information between visual features and 3D structures, allowing 3D awareness to emerge naturally within a unified visual representation. Unlike existing approaches, our method enables 3D perception to emerge implicitly, di...