[2602.18739] When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models
Summary
This paper introduces the Physical-Conditioned World Model Attack (PhysCond-WMA), a novel method to exploit vulnerabilities in generative world models by perturbing physical-condition channels, revealing significant security risks.
Why It Matters
As generative world models are increasingly utilized in applications like autonomous driving, understanding their vulnerabilities is crucial. This research highlights potential attack vectors that could compromise safety and performance, prompting the need for enhanced security measures in AI systems.
Key Takeaways
- PhysCond-WMA is the first white-box attack targeting world models.
- The attack manipulates physical-condition channels to distort semantic outputs while maintaining visual fidelity.
- Experimental results show a 4% decrease in 3D detection performance due to attacked training videos.
- The study quantifies security vulnerabilities in generative world models, emphasizing the need for robust security checks.
- The findings suggest that current AI systems may be more susceptible to adversarial attacks than previously understood.
Computer Science > Machine Learning arXiv:2602.18739 (cs) [Submitted on 21 Feb 2026] Title:When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models Authors:Zhixiang Guo, Siyuan Liang, Andras Balogh, Noah Lunberry, Rong-Cheng Tu, Mark Jelasity, Dacheng Tao View a PDF of the paper titled When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models, by Zhixiang Guo and 6 other authors View PDF HTML (experimental) Abstract:Generative world models (WMs) are increasingly used to synthesize controllable, sensor-conditioned driving videos, yet their reliance on physical priors exposes novel attack surfaces. In this paper, we present Physical-Conditioned World Model Attack (PhysCond-WMA), the first white-box world model attack that perturbs physical-condition channels, such as HDMap embeddings and 3D-box features, to induce semantic, logic, or decision-level distortion while preserving perceptual fidelity. PhysCond-WMA is optimized in two stages: (1) a quality-preserving guidance stage that constrains reverse-diffusion loss below a calibrated threshold, and (2) a momentum-guided denoising stage that accumulates target-aligned gradients along the denoising trajectory for stable, temporally coherent semantic shifts. Extensive experimental results demonstrate that our approach remains effective while increasing FID by about 9% on average and FVD by about 3.9% on average. Under the targeted attack setting, the attack s...