[2603.25140] SAVe: Self-Supervised Audio-visual Deepfake Detection

[2603.25140] SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment

arXiv - Machine Learning March 27, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.25140: SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.25140 (cs) [Submitted on 26 Mar 2026] Title:SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment Authors:Sahibzada Adil Shahzad, Ammarah Hashmi, Junichi Yamagishi, Yusuke Yasuda, Yu Tsao, Chia-Wen Lin, Yan-Tsung Peng, Hsin-Min Wang View a PDF of the paper titled SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment, by Sahibzada Adil Shahzad and 7 other authors View PDF HTML (experimental) Abstract:Multimodal deepfakes can exhibit subtle visual artifacts and cross-modal inconsistencies, which remain challenging to detect, especially when detectors are trained primarily on curated synthetic forgeries. Such synthetic dependence can introduce dataset and generator bias, limiting scalability and robustness to unseen manipulations. We propose SAVe, a self-supervised audio-visual deepfake detection framework that learns entirely on authentic videos. SAVe generates on-the-fly, identity-preserving, region-aware self-blended pseudo-manipulations to emulate tampering artifacts, enabling the model to learn complementary visual cues across multiple facial granularities. To capture cross-modal evidence, SAVe also models lip-speech synchronization via an audio-visual alignment component that detects temporal misalignment patterns characteristic of audio-visual forgeries. Experiments on FakeAVCeleb and A...

Originally published on March 27, 2026. Curated by AI News.

Robotics

AI system learns to prevent warehouse robot traffic jams, boosting throughput 25%

"Inside a giant autonomous warehouse, hundreds of robots dart down aisles as they collect and distribute items to fulfill a steady stream...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

[2603.16673] When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Abstract page for arXiv paper 2603.16673: When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Rob...

arXiv - Machine Learning · 4 min · about 7 hours ago

Machine Learning

[2512.22854] ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning

Abstract page for arXiv paper 2512.22854: ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum ...

arXiv - Machine Learning · 4 min · about 7 hours ago

Machine Learning

[2511.14427] Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

Abstract page for arXiv paper 2511.14427: Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

arXiv - Machine Learning · 4 min · about 7 hours ago

[2603.25140] SAVe: Self-Supervised Audio-visual Deepfake Detection Exploiting Visual Artifacts and Audio-visual Misalignment

About this article

Related Articles

AI system learns to prevent warehouse robot traffic jams, boosting throughput 25%

[2603.16673] When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

[2512.22854] ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning

[2511.14427] Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning

No comments

Stay updated with AI News