[2510.24133] Compositional Image Synthesis with Inference-Time Scaling

[2510.24133] Compositional Image Synthesis with Inference-Time Scaling

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2510.24133: Compositional Image Synthesis with Inference-Time Scaling

Computer Science > Computer Vision and Pattern Recognition arXiv:2510.24133 (cs) [Submitted on 28 Oct 2025 (v1), last revised 27 Mar 2026 (this version, v2)] Title:Compositional Image Synthesis with Inference-Time Scaling Authors:Minsuk Ji, Sanghyeok Lee, Namhyuk Ahn View a PDF of the paper titled Compositional Image Synthesis with Inference-Time Scaling, by Minsuk Ji and 2 other authors View PDF HTML (experimental) Abstract:Despite their impressive realism, modern text-to-image models still struggle with compositionality, often failing to render accurate object counts, attributes, and spatial relations. To address this challenge, we present a training-free framework that combines an object-centric approach with self-refinement to improve layout faithfulness while preserving aesthetic quality. Specifically, we leverage large language models (LLMs) to synthesize explicit layouts from input prompts, and we inject these layouts into the image generation process, where a object-centric vision-language model (VLM) judge reranks multiple candidates to select the most prompt-aligned outcome iteratively. By unifying explicit layout-grounding with self-refine-based inference-time scaling, our framework achieves stronger scene alignment with prompts compared to recent text-to-image models. The code are available at this https URL. Comments: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) Cite as: arXiv:2510.24133 [cs.CV]   (or arXiv:2510.24...

Originally published on March 30, 2026. Curated by AI News.

Related Articles

Llms

Why are we blindly trusting AI companies with our data?

Lately I’ve been seeing a story floating around that really made me pause. Apparently, there were claims that the US government asked Ant...

Reddit - Artificial Intelligence · 1 min ·
De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV
Llms

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

Artificial intelligence is transforming every corner of industry, and television is no exception. Major networks in Korea have recently a...

AI Tools & Products · 4 min ·
[2603.16629] MLLM-based Textual Explanations for Face Comparison
Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min ·
[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
Llms

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime