[2605.01950] TRAP: Tail-aware Ranking Attack for World-Model Planning
About this article
Abstract page for arXiv paper 2605.01950: TRAP: Tail-aware Ranking Attack for World-Model Planning
Computer Science > Machine Learning arXiv:2605.01950 (cs) [Submitted on 3 May 2026] Title:TRAP: Tail-aware Ranking Attack for World-Model Planning Authors:Siyuan Duan, Ke Zhang, Xizhao Luo View a PDF of the paper titled TRAP: Tail-aware Ranking Attack for World-Model Planning, by Siyuan Duan and 2 other authors View PDF HTML (experimental) Abstract:World models enable long-horizon planning by internally generating and evaluating imagined trajectories, making them a promising foundation for generalist agents. However, this imagination-driven decision process also introduces new security risks. Existing backdoor attacks typically aim to manipulate local features, one-step predictions, or instantaneous policy outputs. While such objectives may suffice for weaker reactive models, they are often ineffective against world models, where the learned dynamics prior and planning process can absorb or wash out the effects of shallow perturbations. More importantly, we find that world models exhibit a distinct backdoor vulnerability rooted in the long-tailed ranking structure of imagined trajectories, where disrupting the ordering of a few decision-critical trajectories can systematically hijack planning. To exploit this vulnerability, we propose TRAP, a backdoor attack framework for world models that targets imagined trajectory ranking. TRAP combines a tail-aware ranking loss to focus optimization on decision-critical trajectories with dual gating mechanisms that stabilize optimizati...