[2601.11675] Generating metamers of human scene understanding

[2601.11675] Generating metamers of human scene understanding

arXiv - AI 4 min read Article

Summary

This article presents MetamerGen, a novel tool that generates metamers of human scene understanding by combining low-resolution gist information with high-resolution details from visual fixations.

Why It Matters

Understanding how humans perceive and interpret visual scenes is crucial for advancements in computer vision and artificial intelligence. MetamerGen offers insights into latent scene representations, enhancing the development of AI systems that can better mimic human visual processing.

Key Takeaways

  • MetamerGen generates images based on human scene understanding using a dual-stream representation.
  • The tool combines low-resolution peripheral information with high-resolution fixated details.
  • A behavioral experiment validated the perceptual alignment of generated images with human scene representations.
  • High-level semantic alignment is crucial for predicting metamerism in generated scenes.
  • The research contributes to understanding visual processing at multiple levels.

Computer Science > Computer Vision and Pattern Recognition arXiv:2601.11675 (cs) [Submitted on 16 Jan 2026 (v1), last revised 24 Feb 2026 (this version, v3)] Title:Generating metamers of human scene understanding Authors:Ritik Raina, Abe Leite, Alexandros Graikos, Seoyoung Ahn, Dimitris Samaras, Gregory J. Zelinsky View a PDF of the paper titled Generating metamers of human scene understanding, by Ritik Raina and 5 other authors View PDF HTML (experimental) Abstract:Human vision combines low-resolution "gist" information from the visual periphery with sparse but high-resolution information from fixated locations to construct a coherent understanding of a visual scene. In this paper, we introduce MetamerGen, a tool for generating scenes that are aligned with latent human scene representations. MetamerGen is a latent diffusion model that combines peripherally obtained scene gist information with information obtained from scene-viewing fixations to generate image metamers for what humans understand after viewing a scene. Generating images from both high and low resolution (i.e. "foveated") inputs constitutes a novel image-to-image synthesis problem, which we tackle by introducing a dual-stream representation of the foveated scenes consisting of DINOv2 tokens that fuse detailed features from fixated areas with peripherally degraded features capturing scene context. To evaluate the perceptual alignment of MetamerGen generated images to latent human scene representations, we con...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Using machine learning to identify individuals at risk for intimate partner violence
Machine Learning

Using machine learning to identify individuals at risk for intimate partner violence

Researchers at Mass General Brigham have developed a series of artificial intelligence (AI) tools that uses machine learning to identify ...

AI News - General · 7 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime