Machine Learning Computer Vision Ai Agents

[2602.12486] Human-Like Coarse Object Representations in Vision Models

arXiv - AI February 16, 2026 3 min read Article

Summary

This paper explores how vision models can develop human-like coarse object representations, emphasizing the balance between detail and physical prediction efficiency.

Why It Matters

Understanding how vision models can mimic human object representation is crucial for advancements in AI and robotics. This research highlights the importance of model training parameters in achieving efficient physical predictions, which can enhance AI applications in various fields, including autonomous systems and computer vision.

Key Takeaways

Human-like coarse object representations emerge from resource constraints in model training.
An inverse U-shaped curve indicates optimal model size and training time for aligning with human behavior.
Early checkpoints and modest architectures can effectively elicit physics-efficient representations.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.12486 (cs) [Submitted on 12 Feb 2026] Title:Human-Like Coarse Object Representations in Vision Models Authors:Andrey Gizdov, Andrea Procopio, Yichen Li, Daniel Harari, Tomer Ullman View a PDF of the paper titled Human-Like Coarse Object Representations in Vision Models, by Andrey Gizdov and 4 other authors View PDF HTML (experimental) Abstract:Humans appear to represent objects for intuitive physics with coarse, volumetric bodies'' that smooth concavities - trading fine visual details for efficient physical predictions - yet their internal structure is largely unknown. Segmentation models, in contrast, optimize pixel-accurate masks that may misalign with such bodies. We ask whether and when these models nonetheless acquire human-like bodies. Using a time-to-collision (TTC) behavioral paradigm, we introduce a comparison pipeline and alignment metric, then vary model training time, size, and effective capacity via pruning. Across all manipulations, alignment with human behavior follows an inverse U-shaped curve: small/briefly trained/pruned models under-segment into blobs; large/fully trained models over-segment with boundary wiggles; and an intermediate ideal body granularity'' best matches humans. This suggests human-like coarse bodies emerge from resource constraints rather than bespoke biases, and points to simple knobs - early checkpoints, modest architectures, light pruning - for eliciting physics-ef...

Read Original Article

Machine Learning

[P] Looking for people who have had training runs fail unexpectedly to beta test a stability monitor. Free, takes 5 minutes to add to your existing loop. DM me.

Anyone actively training models want to try a stability monitor on a real run? Trying to get real world validation outside my own benchma...

Reddit - Machine Learning · 1 min · 26 minutes ago

Llms

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MAR...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

Less than a year after launching, with checks from some of the biggest names in Silicon Valley, crowdsourced AI model feedback startup Yu...

TechCrunch - AI · 4 min · about 4 hours ago

Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min · about 7 hours ago

[2602.12486] Human-Like Coarse Object Representations in Vision Models

Summary

Why It Matters

Key Takeaways

Related Articles

[P] Looking for people who have had training runs fail unexpectedly to beta test a stability monitor. Free, takes 5 minutes to add to your existing loop. DM me.

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

[R] Fine-tuning services report

No comments

Stay updated with AI News