[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

arXiv - Machine Learning 4 min read Article

Summary

ViGText introduces a novel approach to deepfake detection by integrating Vision-Language Model explanations with Graph Neural Networks, enhancing robustness and generalization against sophisticated deepfakes.

Why It Matters

As deepfake technology advances, ensuring media authenticity becomes critical. ViGText's innovative method addresses the limitations of traditional detection techniques, offering a more reliable solution to combat misinformation and uphold information integrity in digital media.

Key Takeaways

  • ViGText improves deepfake detection accuracy from 72.45% to 98.32% in generalization evaluations.
  • The model integrates visual and textual data for a more context-aware analysis of deepfakes.
  • Robustness against targeted attacks is enhanced, limiting performance degradation to less than 4%.
  • Multi-level feature extraction across spatial and frequency domains boosts detection capabilities.
  • ViGText sets a new standard for deepfake detection, crucial for maintaining media authenticity.

Computer Science > Computer Vision and Pattern Recognition arXiv:2507.18031 (cs) [Submitted on 24 Jul 2025 (v1), last revised 20 Feb 2026 (this version, v2)] Title:ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks Authors:Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan View a PDF of the paper titled ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks, by Ahmad ALBarqawi and 4 other authors View PDF HTML (experimental) Abstract:The rapid rise of deepfake technology, which produces realistic but fraudulent digital content, threatens the authenticity of media. Traditional deepfake detection approaches often struggle with sophisticated, customized deepfakes, especially in terms of generalization and robustness against malicious attacks. This paper introduces ViGText, a novel approach that integrates images with Vision Large Language Model (VLLM) Text explanations within a Graph-based framework to improve deepfake detection. The novelty of ViGText lies in its integration of detailed explanations with visual data, as it provides a more context-aware analysis than captions, which often lack specificity and fail to reveal subtle inconsistencies. ViGText systematically divides images into patches, constructs image and text graphs, and integrates them for analysis using Graph Neural Networks (GNNs) to identify deepfakes. Through the use of multi-level feat...

Related Articles

Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min ·
Llms

What does Gemini think of you?

I noticed that Gemini was referring back to a lot of queries I've made in the past and was using that knowledge to drive follow up prompt...

Reddit - Artificial Intelligence · 1 min ·
Llms

This app helps you see what LLMs you can run on your hardware

submitted by /u/dev_is_active [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime