[2602.22703] Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning

[2602.22703] Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning

arXiv - Machine Learning 4 min read Article

Summary

The paper presents GeoPerceive, a benchmark for evaluating geometric perception in vision-language models (VLMs), and introduces GeoDPO, a translator-guided reinforcement learning framework that significantly enhances VLMs' geometric reasoning capabilities.

Why It Matters

This research addresses the limitations of current VLMs in geometric reasoning, a critical aspect for applications in fields like robotics and computer vision. By introducing a novel benchmark and method, it paves the way for improved model performance and generalization in understanding complex visual data.

Key Takeaways

  • GeoPerceive benchmark allows isolated evaluation of geometric perception in VLMs.
  • GeoDPO framework utilizes a translator for enhanced reinforcement learning performance.
  • Significant performance improvements observed: +26.5% in-domain and +39.0% on downstream reasoning tasks.
  • Supervised fine-tuning may impair performance in out-of-domain scenarios.
  • All codes are made publicly available for reproducibility.

Computer Science > Machine Learning arXiv:2602.22703 (cs) [Submitted on 26 Feb 2026] Title:Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning Authors:Hao Yu, Shuning Jia, Guanghao Li, Wenhao Jiang, Chun Yuan View a PDF of the paper titled Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning, by Hao Yu and 4 other authors View PDF Abstract:Vision-language models (VLMs) often struggle with geometric reasoning due to their limited perception of fundamental diagram elements. To tackle this challenge, we introduce GeoPerceive, a benchmark comprising diagram instances paired with domain-specific language (DSL) representations, along with an efficient automatic data generation pipeline. This design enables the isolated evaluation of geometric perception independently from reasoning. To exploit the data provided by GeoPerceive for enhancing the geometric perception capabilities of VLMs, we propose GeoDPO, a translator-guided reinforcement learning (RL) framework. GeoDPO employs an NL-to-DSL translator, which is trained on synthetic pairs generated by the data engine of GeoPerceive, to bridge natural language and DSL. This translator facilitates the computation of fine-grained, DSL-level scores, which serve as reward signals in reinforcement learning. We assess GeoDPO on both in-domain and out-of-domain datasets, spanning tasks in geometric perception as well as downstream reasoning. Experimental results demonstrate th...

Related Articles

Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min ·
Llms

We hit 150 stars on our AI setup tool!

yo folks, we just hit 150 stars on our open source tool that auto makes AI context files. got 90 PRs merged and 20 issues that ppl are pi...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is ai getting dummer?

Over the past month, it feels like GPT and Gemini have been giving wrong answers a lot. Do you feel the same, or am I exaggerating? submi...

Reddit - Artificial Intelligence · 1 min ·
Llms

If AI is really making us more productive... why does it feel like we are working more, not less...?

The promise of AI was the ultimate system optimisation: Efficiency. On paper, the tools are delivering something similar to what they pro...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime