[2510.18060] SPACeR: Self-Play Anchoring with Centralized Reference Models
Summary
The paper introduces SPACeR, a framework for enhancing autonomous vehicle behavior through self-play reinforcement learning anchored by a centralized reference model, achieving efficiency and human-like performance.
Why It Matters
As autonomous vehicles become increasingly prevalent, ensuring they exhibit safe and human-like behaviors is crucial. SPACeR addresses the challenges of current models by combining imitation learning with self-play reinforcement learning, offering a scalable solution that could significantly improve the development of AV policies.
Key Takeaways
- SPACeR combines centralized reference models with decentralized self-play to improve AV behavior.
- The framework achieves up to 10x faster inference and 50x smaller model size compared to traditional generative models.
- It effectively anchors policies to human driving distributions while maintaining scalability.
- SPACeR demonstrates competitive performance in the Waymo Sim Agents Challenge.
- The approach establishes a new paradigm for testing autonomous driving policies.
Computer Science > Machine Learning arXiv:2510.18060 (cs) [Submitted on 20 Oct 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:SPACeR: Self-Play Anchoring with Centralized Reference Models Authors:Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, Matthew Strong, Masayoshi Tomizuka, Yihan Hu, Wei Zhan View a PDF of the paper titled SPACeR: Self-Play Anchoring with Centralized Reference Models, by Wei-Jer Chang and 6 other authors View PDF HTML (experimental) Abstract:Developing autonomous vehicles (AVs) requires not only safety and efficiency, but also realistic, human-like behaviors that are socially aware and predictable. Achieving this requires sim agent policies that are human-like, fast, and scalable in multi-agent settings. Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data, producing realistic policies. However, these models are computationally expensive, slow during inference, and struggle to adapt in reactive, closed-loop scenarios. In contrast, self-play reinforcement learning (RL) scales efficiently and naturally captures multi-agent interactions, but it often relies on heuristics and reward shaping, and the resulting policies can diverge from human norms. We propose SPACeR, a framework that leverages a pretrained tokenized autoregressive motion model as a centralized reference policy to guide decentralized self-play. The reference model provides l...