[2506.13130] ZINA: Multimodal Fine-grained Hallucination Detection and Editing

[2506.13130] ZINA: Multimodal Fine-grained Hallucination Detection and Editing

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2506.13130: ZINA: Multimodal Fine-grained Hallucination Detection and Editing

Computer Science > Computer Vision and Pattern Recognition arXiv:2506.13130 (cs) [Submitted on 16 Jun 2025 (v1), last revised 5 Apr 2026 (this version, v2)] Title:ZINA: Multimodal Fine-grained Hallucination Detection and Editing Authors:Yuiga Wada, Kazuki Matsuda, Komei Sugiura, Graham Neubig View a PDF of the paper titled ZINA: Multimodal Fine-grained Hallucination Detection and Editing, by Yuiga Wada and 3 other authors View PDF HTML (experimental) Abstract:Multimodal Large Language Models (MLLMs) often generate hallucinations, where the output deviates from the visual content. Given that these hallucinations can take diverse forms, detecting hallucinations at a fine-grained level is essential for comprehensive evaluation and analysis. To this end, we propose a novel task of multimodal fine-grained hallucination detection and editing for MLLMs. Moreover, we propose ZINA, a novel method that identifies hallucinated spans at a fine-grained level, classifies their error types into six categories, and suggests appropriate refinements. To train and evaluate models for this task, we construct VisionHall, a dataset comprising 6.9k outputs from twelve MLLMs manually annotated by 211 annotators, and 20k synthetic samples generated using a graph-based method that captures dependencies among error types. We demonstrated that ZINA outperformed existing methods, including GPT-4o and Llama-3.2, in both detection and editing tasks. Comments: Subjects: Computer Vision and Pattern Recogn...

Originally published on April 07, 2026. Curated by AI News.

Related Articles

When Robots Have Their ChatGPT Moment, Remember These Pincers | WIRED
Llms

When Robots Have Their ChatGPT Moment, Remember These Pincers | WIRED

From sorting chicken nuggets to screwing in light bulbs, Eka’s robots are eerily lifelike. But do they have real physical smarts?

Wired - AI · 13 min ·
Llms

87% Cost Savings & Sub-3s Latency: I built a "Warm-Cache" harness for persistent Claude agents.

**The "Goldfish Problem" is expensive. I decided to fix the plumbing.** Most Claude implementations leave 90% of their money on the table...

Reddit - Artificial Intelligence · 1 min ·
Llms

What are people using for low-latency autocomplete in production? [P]

I’ve been looking into autocomplete/typeahead systems recently, especially in contexts where latency really matters (e.g. search-as-you-t...

Reddit - Machine Learning · 1 min ·
General Motors is adding Gemini to four million cars | The Verge
Llms

General Motors is adding Gemini to four million cars | The Verge

General Motors is planning to bring Google’s Gemini AI assistant to around four million vehicles across the US.

The Verge - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime