[2603.03143] Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing
About this article
Abstract page for arXiv paper 2603.03143: Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.03143 (cs) [Submitted on 3 Mar 2026] Title:Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing Authors:Jiyuan Wang, Chunyu Lin, Lei Sun, Zhi Cao, Yuyang Yin, Lang Nie, Zhenlong Yuan, Xiangxiang Chu, Yunchao Wei, Kang Liao, Guosheng Lin View a PDF of the paper titled Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing, by Jiyuan Wang and 10 other authors View PDF HTML (experimental) Abstract:Leveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm. However, maintaining multi-view consistency in edited results remains challenging, and the extreme scarcity of 3D-consistent editing paired data renders supervised fine-tuning (SFT), the most effective training strategy for editing tasks, infeasible. In this paper, we observe that, while generating multi-view consistent 3D content is highly challenging, verifying 3D consistency is tractable, naturally positioning reinforcement learning (RL) as a feasible solution. Motivated by this, we propose \textbf{RL3DEdit}, a single-pass framework driven by RL optimization with novel rewards derived from the 3D foundation model, VGGT. Specifically, we leverage VGGT's robust priors learned from massive real-world data, feed the edited images, and utilize the output confidence maps and pose estimation errors as reward signals, effectively anchoring the 2D editing priors onto a ...