Llms Machine Learning Ai Infrastructure Computer Vision Ai Safety

[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

arXiv - Machine Learning February 23, 2026 4 min read Article

Summary

ViGText introduces a novel approach to deepfake detection by integrating Vision-Language Model explanations with Graph Neural Networks, enhancing robustness and generalization against sophisticated deepfakes.

Why It Matters

As deepfake technology advances, ensuring media authenticity becomes critical. ViGText's innovative method addresses the limitations of traditional detection techniques, offering a more reliable solution to combat misinformation and uphold information integrity in digital media.

Key Takeaways

ViGText improves deepfake detection accuracy from 72.45% to 98.32% in generalization evaluations.
The model integrates visual and textual data for a more context-aware analysis of deepfakes.
Robustness against targeted attacks is enhanced, limiting performance degradation to less than 4%.
Multi-level feature extraction across spatial and frequency domains boosts detection capabilities.
ViGText sets a new standard for deepfake detection, crucial for maintaining media authenticity.

Computer Science > Computer Vision and Pattern Recognition arXiv:2507.18031 (cs) [Submitted on 24 Jul 2025 (v1), last revised 20 Feb 2026 (this version, v2)] Title:ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks Authors:Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan View a PDF of the paper titled ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks, by Ahmad ALBarqawi and 4 other authors View PDF HTML (experimental) Abstract:The rapid rise of deepfake technology, which produces realistic but fraudulent digital content, threatens the authenticity of media. Traditional deepfake detection approaches often struggle with sophisticated, customized deepfakes, especially in terms of generalization and robustness against malicious attacks. This paper introduces ViGText, a novel approach that integrates images with Vision Large Language Model (VLLM) Text explanations within a Graph-based framework to improve deepfake detection. The novelty of ViGText lies in its integration of detailed explanations with visual data, as it provides a more context-aware analysis than captions, which often lack specificity and fail to reveal subtle inconsistencies. ViGText systematically divides images into patches, constructs image and text graphs, and integrates them for analysis using Graph Neural Networks (GNNs) to identify deepfakes. Through the use of multi-level feat...

Read Original Article

[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

Summary

Why It Matters

Key Takeaways

Related Articles

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

What does Gemini think of you?

This app helps you see what LLMs you can run on your hardware

No comments

Stay updated with AI News