[2604.16552] Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

[2604.16552] Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2604.16552: Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

Computer Science > Computer Vision and Pattern Recognition arXiv:2604.16552 (cs) [Submitted on 17 Apr 2026 (v1), last revised 29 Apr 2026 (this version, v2)] Title:Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion Authors:Zhenggang Tang, Yuehao Wang, Yuchen Fan, Jun-Kun Chen, Yu-Ying Yeh, Kihyuk Sohn, Zhangyang Wang, Qixing Huang, Alexander Schwing, Rakesh Ranjan, Dilin Wang, Zhicheng Yan View a PDF of the paper titled Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion, by Zhenggang Tang and 11 other authors View PDF HTML (experimental) Abstract:Recent text-to-scene generation approaches largely reduced the manual efforts required to create 3D scenes. However, their focus is either to generate a scene layout or to generate objects, and few generate both. The generated scene layout is often simple even with LLM's help. Moreover, the generated scene is often inconsistent with the text input that contains non-trivial descriptions of the shape, appearance, and spatial arrangement of the objects. We present a new paradigm of sequential text-to-scene generation and propose a novel generative model for interactive scene creation. At the core is a 3D Autoregressive Diffusion model 3D-ARD+, which unifies the autoregressive generation over a multimodal token sequence and diffusion generation of next-object 3D latents. To generate the next object, the model uses one autoregressive step to generate the coarse-grained 3D latents in the...

Originally published on April 30, 2026. Curated by AI News.

Related Articles

Llms

Anthropic mass shipped 9 connectors and accidentally leaked their entire creative industry strategy

The announcement yesterday was genuinely significant and i don't think most people outside the creative industry understand why. Anthropi...

Reddit - Artificial Intelligence · 1 min ·
[2604.17612] Provable Coordination for LLM Agents via Message Sequence Charts
Llms

[2604.17612] Provable Coordination for LLM Agents via Message Sequence Charts

Abstract page for arXiv paper 2604.17612: Provable Coordination for LLM Agents via Message Sequence Charts

arXiv - AI · 3 min ·
[2603.12249] SciMDR: Advancing Scientific Multimodal Document Reasoning
Llms

[2603.12249] SciMDR: Advancing Scientific Multimodal Document Reasoning

Abstract page for arXiv paper 2603.12249: SciMDR: Advancing Scientific Multimodal Document Reasoning

arXiv - AI · 3 min ·
[2512.03992] Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness
Llms

[2512.03992] Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness

Abstract page for arXiv paper 2512.03992: Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime