[2512.03992] Value-Guided Iterative Refinement and the DIQ-H Benchmark

[2512.03992] Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness

arXiv - AI April 30, 2026 4 min read

About this article

Abstract page for arXiv paper 2512.03992: Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness

Computer Science > Computer Vision and Pattern Recognition arXiv:2512.03992 (cs) [Submitted on 3 Dec 2025 (v1), last revised 29 Apr 2026 (this version, v2)] Title:Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness Authors:Hanwen Wan, Zexin Lin, Yixuan Deng, Xiaoqiang Ji View a PDF of the paper titled Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness, by Hanwen Wan and 3 other authors View PDF HTML (experimental) Abstract:Vision-Language Models (VLMs) are essential for embodied AI and safety-critical applications, such as robotics and autonomous systems. However, existing benchmarks primarily focus on static or curated visual inputs, neglecting the challenges posed by adversarial conditions, value misalignment, and error propagation in continuous deployment. Current benchmarks either overlook the impact of real-world perturbations, or fail to account for the cumulative effect of inconsistent reasoning over time. To address these gaps, we introduce the Degraded Image Quality Leading to Hallucinations (DIQ-H) benchmark, the first to evaluate VLMs under adversarial visual conditions in continuous sequences. DIQ-H simulates real-world stressors including motion blur, sensor noise, and compression artifacts, and measures how these corruptions lead to persistent errors and misaligned outputs across time. The benchmark explicitly models error propagation and its long-term value consistency. To enhance scala...

Originally published on April 30, 2026. Curated by AI News.

Llms

Anthropic mass shipped 9 connectors and accidentally leaked their entire creative industry strategy

The announcement yesterday was genuinely significant and i don't think most people outside the creative industry understand why. Anthropi...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

[2604.16552] Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

Abstract page for arXiv paper 2604.16552: Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

arXiv - AI · 4 min · about 2 hours ago

Llms

[2604.17612] Provable Coordination for LLM Agents via Message Sequence Charts

Abstract page for arXiv paper 2604.17612: Provable Coordination for LLM Agents via Message Sequence Charts

arXiv - AI · 3 min · about 2 hours ago

Llms

[2603.12249] SciMDR: Advancing Scientific Multimodal Document Reasoning

Abstract page for arXiv paper 2603.12249: SciMDR: Advancing Scientific Multimodal Document Reasoning

arXiv - AI · 3 min · about 2 hours ago

[2512.03992] Value-Guided Iterative Refinement and the DIQ-H Benchmark for Evaluating VLM Robustness

About this article

Related Articles

Anthropic mass shipped 9 connectors and accidentally leaked their entire creative industry strategy

[2604.16552] Co-generation of Layout and Shape from Text via Autoregressive 3D Diffusion

[2604.17612] Provable Coordination for LLM Agents via Message Sequence Charts

[2603.12249] SciMDR: Advancing Scientific Multimodal Document Reasoning

No comments

Stay updated with AI News