[2509.20648] Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration
Summary
This paper presents CERMIC, a novel framework for enhancing multi-agent exploration in reinforcement learning by calibrating intrinsic curiosity based on peer behavior, addressing the limitations of existing curiosity mechanisms.
Why It Matters
The research addresses a critical challenge in multi-agent reinforcement learning: effective exploration in environments with sparse rewards. By improving how agents utilize intrinsic motivation, this work has implications for advancing AI capabilities in complex, decentralized settings, potentially leading to more efficient learning and better performance in real-world applications.
Key Takeaways
- CERMIC enhances exploration by filtering noisy surprise signals.
- The framework allows agents to dynamically calibrate curiosity based on multi-agent context.
- Empirical results show significant performance improvements over state-of-the-art algorithms in sparse-reward scenarios.
Computer Science > Machine Learning arXiv:2509.20648 (cs) [Submitted on 25 Sep 2025 (v1), last revised 22 Feb 2026 (this version, v3)] Title:Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration Authors:Yiyuan Pan, Zhe Liu, Hesheng Wang View a PDF of the paper titled Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration, by Yiyuan Pan and 2 other authors View PDF HTML (experimental) Abstract:Autonomous exploration in complex multi-agent reinforcement learning (MARL) with sparse rewards critically depends on providing agents with effective intrinsic motivation. While artificial curiosity offers a powerful self-supervised signal, it often confuses environmental stochasticity with meaningful novelty. Moreover, existing curiosity mechanisms exhibit a uniform novelty bias, treating all unexpected observations equally. However, peer behavior novelty, which encode latent task dynamics, are often overlooked, resulting in suboptimal exploration in decentralized, communication-free MARL settings. To this end, inspired by how human children adaptively calibrate their own exploratory behaviors via observing peers, we propose a novel approach to enhance multi-agent exploration. We introduce CERMIC, a principled framework that empowers agents to robustly filter noisy surprise signals and guide exploration by dynamically calibrating their intrinsic curiosity with inferred multi-agent context. Additionally, CERMIC gen...