[2510.18060] SPACeR: Self-Play Anchoring with Centralized Reference Models

[2510.18060] SPACeR: Self-Play Anchoring with Centralized Reference Models

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces SPACeR, a framework for enhancing autonomous vehicle behavior through self-play reinforcement learning anchored by a centralized reference model, achieving efficiency and human-like performance.

Why It Matters

As autonomous vehicles become increasingly prevalent, ensuring they exhibit safe and human-like behaviors is crucial. SPACeR addresses the challenges of current models by combining imitation learning with self-play reinforcement learning, offering a scalable solution that could significantly improve the development of AV policies.

Key Takeaways

  • SPACeR combines centralized reference models with decentralized self-play to improve AV behavior.
  • The framework achieves up to 10x faster inference and 50x smaller model size compared to traditional generative models.
  • It effectively anchors policies to human driving distributions while maintaining scalability.
  • SPACeR demonstrates competitive performance in the Waymo Sim Agents Challenge.
  • The approach establishes a new paradigm for testing autonomous driving policies.

Computer Science > Machine Learning arXiv:2510.18060 (cs) [Submitted on 20 Oct 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:SPACeR: Self-Play Anchoring with Centralized Reference Models Authors:Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, Matthew Strong, Masayoshi Tomizuka, Yihan Hu, Wei Zhan View a PDF of the paper titled SPACeR: Self-Play Anchoring with Centralized Reference Models, by Wei-Jer Chang and 6 other authors View PDF HTML (experimental) Abstract:Developing autonomous vehicles (AVs) requires not only safety and efficiency, but also realistic, human-like behaviors that are socially aware and predictable. Achieving this requires sim agent policies that are human-like, fast, and scalable in multi-agent settings. Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data, producing realistic policies. However, these models are computationally expensive, slow during inference, and struggle to adapt in reactive, closed-loop scenarios. In contrast, self-play reinforcement learning (RL) scales efficiently and naturally captures multi-agent interactions, but it often relies on heuristics and reward shaping, and the resulting policies can diverge from human norms. We propose SPACeR, a framework that leverages a pretrained tokenized autoregressive motion model as a centralized reference policy to guide decentralized self-play. The reference model provides l...

Related Articles

Machine Learning

Making an AI native sovereign computational stack

I’ve been working on a personal project that ended up becoming a kind of full computing stack: identity / trust protocol decentralized ch...

Reddit - Artificial Intelligence · 1 min ·
Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

What tools are sr MLEs using? (clawdbot, openspec, wispr) [D]

I'm already blasting cursor, but I want to level up my output. I heard that these kind of AI tools and workflows are being asked in SF. W...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] looking for academic collaborators

hey there, i am currently working with a research group at auckland university. we are currently working on neurodegenerative diseases - ...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime