[2602.09929] Monocular Normal Estimation via Shading Sequence Estimation
Summary
This paper presents a novel approach to monocular normal estimation by reformulating the problem as shading sequence estimation, enhancing accuracy in 3D surface reconstruction.
Why It Matters
Monocular normal estimation is crucial for various applications in computer vision, such as 3D modeling and augmented reality. The proposed method addresses limitations in existing techniques, improving the alignment of estimated normal maps with geometric details, which is essential for realistic rendering and object recognition.
Key Takeaways
- The paper introduces RoSE, a method that reformulates normal estimation as shading sequence estimation.
- RoSE leverages image-to-video generative models to enhance the sensitivity to geometric information.
- The method is trained on a synthetic dataset, MultiShade, to improve robustness across diverse conditions.
- RoSE achieves state-of-the-art performance on benchmark datasets for monocular normal estimation.
- This approach addresses the 3D misalignment issue prevalent in traditional normal estimation methods.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.09929 (cs) [Submitted on 10 Feb 2026 (v1), last revised 25 Feb 2026 (this version, v3)] Title:Monocular Normal Estimation via Shading Sequence Estimation Authors:Zongrui Li, Xinhua Ma, Minghui Hu, Yunqing Zhao, Yingchen Yu, Qian Zheng, Chang Liu, Xudong Jiang, Song Bai View a PDF of the paper titled Monocular Normal Estimation via Shading Sequence Estimation, by Zongrui Li and 8 other authors View PDF Abstract:Monocular normal estimation aims to estimate the normal map from a single RGB image of an object under arbitrary lights. Existing methods rely on deep models to directly predict normal maps. However, they often suffer from 3D misalignment: while the estimated normal maps may appear to have a correct appearance, the reconstructed surfaces often fail to align with the geometric details. We argue that this misalignment stems from the current paradigm: the model struggles to distinguish and reconstruct varying geometry represented in normal maps, as the differences in underlying geometry are reflected only through relatively subtle color variations. To address this issue, we propose a new paradigm that reformulates normal estimation as shading sequence estimation, where shading sequences are more sensitive to various geometric information. Building on this paradigm, we present RoSE, a method that leverages image-to-video generative models to predict shading sequences. The predicted shading sequences ar...