[2506.13130] ZINA: Multimodal Fine-grained Hallucination Detection and

[2506.13130] ZINA: Multimodal Fine-grained Hallucination Detection and Editing

arXiv - AI April 07, 2026 3 min read

About this article

Abstract page for arXiv paper 2506.13130: ZINA: Multimodal Fine-grained Hallucination Detection and Editing

Computer Science > Computer Vision and Pattern Recognition arXiv:2506.13130 (cs) [Submitted on 16 Jun 2025 (v1), last revised 5 Apr 2026 (this version, v2)] Title:ZINA: Multimodal Fine-grained Hallucination Detection and Editing Authors:Yuiga Wada, Kazuki Matsuda, Komei Sugiura, Graham Neubig View a PDF of the paper titled ZINA: Multimodal Fine-grained Hallucination Detection and Editing, by Yuiga Wada and 3 other authors View PDF HTML (experimental) Abstract:Multimodal Large Language Models (MLLMs) often generate hallucinations, where the output deviates from the visual content. Given that these hallucinations can take diverse forms, detecting hallucinations at a fine-grained level is essential for comprehensive evaluation and analysis. To this end, we propose a novel task of multimodal fine-grained hallucination detection and editing for MLLMs. Moreover, we propose ZINA, a novel method that identifies hallucinated spans at a fine-grained level, classifies their error types into six categories, and suggests appropriate refinements. To train and evaluate models for this task, we construct VisionHall, a dataset comprising 6.9k outputs from twelve MLLMs manually annotated by 211 annotators, and 20k synthetic samples generated using a graph-based method that captures dependencies among error types. We demonstrated that ZINA outperformed existing methods, including GPT-4o and Llama-3.2, in both detection and editing tasks. Comments: Subjects: Computer Vision and Pattern Recogn...

Originally published on April 07, 2026. Curated by AI News.

Llms

When Robots Have Their ChatGPT Moment, Remember These Pincers | WIRED

From sorting chicken nuggets to screwing in light bulbs, Eka’s robots are eerily lifelike. But do they have real physical smarts?

Wired - AI · 13 min · about 2 hours ago

Llms

87% Cost Savings & Sub-3s Latency: I built a "Warm-Cache" harness for persistent Claude agents.

**The "Goldfish Problem" is expensive. I decided to fix the plumbing.** Most Claude implementations leave 90% of their money on the table...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

What are people using for low-latency autocomplete in production? [P]

I’ve been looking into autocomplete/typeahead systems recently, especially in contexts where latency really matters (e.g. search-as-you-t...

Reddit - Machine Learning · 1 min · about 3 hours ago

Llms

General Motors is adding Gemini to four million cars | The Verge

General Motors is planning to bring Google’s Gemini AI assistant to around four million vehicles across the US.

The Verge - AI · 4 min · about 4 hours ago

[2506.13130] ZINA: Multimodal Fine-grained Hallucination Detection and Editing

About this article

Related Articles

When Robots Have Their ChatGPT Moment, Remember These Pincers | WIRED

87% Cost Savings & Sub-3s Latency: I built a "Warm-Cache" harness for persistent Claude agents.

What are people using for low-latency autocomplete in production? [P]

General Motors is adding Gemini to four million cars | The Verge

No comments

Stay updated with AI News