[2604.16552] Co-generation of Layout and Shape from Text via

[2604.16552] Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

arXiv - AI April 30, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.16552: Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

Computer Science > Computer Vision and Pattern Recognition arXiv:2604.16552 (cs) [Submitted on 17 Apr 2026 (v1), last revised 29 Apr 2026 (this version, v2)] Title:Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion Authors:Zhenggang Tang, Yuehao Wang, Yuchen Fan, Jun-Kun Chen, Yu-Ying Yeh, Kihyuk Sohn, Zhangyang Wang, Qixing Huang, Alexander Schwing, Rakesh Ranjan, Dilin Wang, Zhicheng Yan View a PDF of the paper titled Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion, by Zhenggang Tang and 11 other authors View PDF HTML (experimental) Abstract:Recent text-to-scene generation approaches largely reduced the manual efforts required to create 3D scenes. However, their focus is either to generate a scene layout or to generate objects, and few generate both. The generated scene layout is often simple even with LLM's help. Moreover, the generated scene is often inconsistent with the text input that contains non-trivial descriptions of the shape, appearance, and spatial arrangement of the objects. We present a new paradigm of sequential text-to-scene generation and propose a novel generative model for interactive scene creation. At the core is a 3D Autoregressive Diffusion model 3D-ARD+, which unifies the autoregressive generation over a multimodal token sequence and diffusion generation of next-object 3D latents. To generate the next object, the model uses one autoregressive step to generate the coarse-grained 3D latents in the...

Originally published on April 30, 2026. Curated by AI News.

Llms

Anthropic mass shipped 9 connectors and accidentally leaked their entire creative industry strategy

The announcement yesterday was genuinely significant and i don't think most people outside the creative industry understand why. Anthropi...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

[2604.17612] Provable Coordination for LLM Agents via Message Sequence Charts

Abstract page for arXiv paper 2604.17612: Provable Coordination for LLM Agents via Message Sequence Charts

arXiv - AI · 3 min · about 2 hours ago

Llms

[2603.12249] SciMDR: Advancing Scientific Multimodal Document Reasoning

Abstract page for arXiv paper 2603.12249: SciMDR: Advancing Scientific Multimodal Document Reasoning

arXiv - AI · 3 min · about 2 hours ago

Llms

[2512.03992] Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness

Abstract page for arXiv paper 2512.03992: Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness

arXiv - AI · 4 min · about 2 hours ago

[2604.16552] Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

About this article

Related Articles

Anthropic mass shipped 9 connectors and accidentally leaked their entire creative industry strategy

[2604.17612] Provable Coordination for LLM Agents via Message Sequence Charts

[2603.12249] SciMDR: Advancing Scientific Multimodal Document Reasoning

[2512.03992] Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness

No comments

Stay updated with AI News