[2602.22596] BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

[2602.22596] BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

arXiv - AI 4 min read Article

Summary

BetterScene introduces an innovative approach to 3D scene synthesis, enhancing novel view synthesis quality using sparse photos and a representation-aligned generative model.

Why It Matters

This research addresses the limitations of existing novel view synthesis methods by improving the quality of generated scenes using advanced techniques. As 3D scene synthesis plays a crucial role in various applications like virtual reality and gaming, advancements in this field can significantly enhance user experiences and content creation.

Key Takeaways

  • BetterScene enhances novel view synthesis (NVS) quality using sparse photos.
  • It leverages a pretrained Stable Video Diffusion model to mitigate artifacts.
  • Introduces temporal equivariance regularization and vision foundation model-aligned representation.
  • Integrates 3D Gaussian Splatting for artifact-free and consistent novel views.
  • Demonstrates superior performance on the DL3DV-10K dataset compared to existing methods.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.22596 (cs) [Submitted on 26 Feb 2026] Title:BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model Authors:Yuci Han, Charles Toth, John E. Anderson, William J. Shuart, Alper Yilmaz View a PDF of the paper titled BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model, by Yuci Han and 4 other authors View PDF HTML (experimental) Abstract:We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained photos. BetterScene leverages the production-ready Stable Video Diffusion (SVD) model pretrained on billions of frames as a strong backbone, aiming to mitigate artifacts and recover view-consistent details at inference time. Conventional methods have developed similar diffusion-based solutions to address these challenges of novel view synthesis. Despite significant improvements, these methods typically rely on off-the-shelf pretrained diffusion priors and fine-tune only the UNet module while keeping other components frozen, which still leads to inconsistent details and artifacts even when incorporating geometry-aware regularizations like depth or semantic conditions. To address this, we investigate the latent space of the diffusion model and introduce two components: (1) temporal equivariance regularization and (2) vision foundation model-aligned representation, both applied to the ...

Related Articles

Machine Learning

[HIRING]Remote AI Training Jobs -Up to $1K/Week| Collaborators Wanted.USA

submitted by /u/nortonakenga [link] [comments]

Reddit - ML Jobs · 1 min ·
Machine Learning

VulcanAMI Might Help

I open-sourced a large AI platform I built solo, working 16 hours a day, at my kitchen table, fueled by an inordinate degree of compulsio...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] I trained an AI to play Resident Evil 4 Remake using Behavioral Cloning + LSTM

I recorded gameplay trajectories in RE4's village — running, shooting, reloading, dodging — and used Behavioral Cloning to train a model ...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime