[2504.08714] Generating Fine Details of Entity Interactions

[2504.08714] Generating Fine Details of Entity Interactions

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2504.08714: Generating Fine Details of Entity Interactions

Computer Science > Computer Vision and Pattern Recognition arXiv:2504.08714 (cs) [Submitted on 11 Apr 2025 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Generating Fine Details of Entity Interactions Authors:Xinyi Gu, Jiayuan Mao View a PDF of the paper titled Generating Fine Details of Entity Interactions, by Xinyi Gu and 1 other authors View PDF HTML (experimental) Abstract:Recent text-to-image models excel at generating high-quality object-centric images from instructions. However, images should also encapsulate rich interactions between objects, where existing models often fall short, likely due to limited training data and benchmarks for rare interactions. This paper explores a novel application of Multimodal Large Language Models (MLLMs) to benchmark and enhance the generation of interaction-rich images. We introduce \data, an interaction-focused dataset with 1000 LLM-generated fine-grained prompts for image generation covering (1) functional and action-based interactions, (2) multi-subject interactions, and (3) compositional spatial relationships. To address interaction-rich generation challenges, we propose a decomposition-augmented refinement procedure. Our approach, \model, leverages LLMs to decompose interactions into finer-grained concepts, uses an MLLM to critique generated images, and applies targeted refinements with a partial diffusion denoising process. Automatic and human evaluations show significantly improved image quality, demonstrating the p...

Originally published on March 05, 2026. Curated by AI News.

Related Articles

Llms

[D] Litellm supply chain attack and what it means for api key management

If you missed it, litellm versions 1.82.7 and 1.82.8 on pypi got compromised. malicious .pth file that runs on every python process start...

Reddit - Machine Learning · 1 min ·
Anthropic's Claude popularity with paying consumers is skyrocketing | TechCrunch
Llms

Anthropic's Claude popularity with paying consumers is skyrocketing | TechCrunch

Estimates for total Claude consumer users are all over the map (we've seen figures ranging from 18 million to 30 million). Anthropic hasn...

TechCrunch - AI · 5 min ·
Llms

I built a single platform integrating GPT-5.2, Grok 4, Claude 3.5, Gemini 3.1 Pro, Luma, Kling, ElevenLabs, OpenAI WebRTC and 50+ tools with shared persistent memory - is this the future of AI or have I over-engineered a mess?

I want to be upfront - I'm a solo founder, not a senior engineer. My background is business, not computer science, though I do have a com...

Reddit - Artificial Intelligence · 1 min ·
Why OpenAI killed Sora | The Verge
Llms

Why OpenAI killed Sora | The Verge

OpenAI’s video-generation AI app, Sora, is dead as of Tuesday. OpenAI said it needs to focus its existing compute on its AI agent goals a...

The Verge - AI · 10 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime