[2603.26769] Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption

[2603.26769] Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.26769: Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.26769 (cs) [Submitted on 24 Mar 2026] Title:Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption Authors:Mehmet Kaan Erol View a PDF of the paper titled Edge Reliability Gap in Vision-Language Models: Quantifying Failure Modes of Compressed VLMs Under Visual Corruption, by Mehmet Kaan Erol View PDF HTML (experimental) Abstract:The rapid compression of large vision-language models (VLMs) for edge deployment raises an underexplored question: do compact models fail differently, not merely more often? This study compares a 7-billion-parameter quantised VLM (Qwen2.5-VL-7B, 4-bit NF4) against a 500-million-parameter FP16 model (SmolVLM2-500M) across 4,000 samples from VQAv2 and COCO Captions. A three-category error taxonomy (Object Blindness, Semantic Drift, Prior Bias) is applied as a diagnostic framework. A text-only GPT-4o judge reveals Semantic Drift (B) as the dominant failure mode on VQAv2 and on COCO for Qwen, with a mixed Object Blindness / Semantic Drift profile for SmolVLM2 on COCO; Prior Bias (C) is present on VQAv2 but absent on COCO for both models. Confidence calibration is measured via Expected Calibration Error (ECE) using geometric mean token probability, compositional reasoning is probed with structured negation probes across four templates, and a blur robustness experiment completes the evaluation. For this model pair, the compa...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Llms

Nvidia goes all-in on AI agents while Anthropic pulls the plug

TLDR: Nvidia is partnering with 17 major companies to build a platform specifically for enterprise AI agents, basically trying to become ...

Reddit - Artificial Intelligence · 1 min ·
Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch
Llms

Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch

It’s about to become more expensive for Claude Code subscribers to use Anthropic’s coding assistant with OpenClaw and other third-party t...

TechCrunch - AI · 4 min ·
Llms

I am seeing Claude everywhere

Every single Instagram reel or TikTok I scroll i see people mentioning Claude and glazing it like it’s some kind of master tool that’s be...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime