[2602.17062] Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning
Summary
This paper presents Successive Sub-value Q-learning (S2Q), a novel approach in multi-agent reinforcement learning (MARL) that retains suboptimal actions to adapt to shifting optima, enhancing performance and exploration.
Why It Matters
As multi-agent systems become increasingly prevalent, the ability to adapt to changing environments is crucial. S2Q addresses the limitations of existing MARL methods by enabling agents to retain and explore multiple high-value actions, leading to improved adaptability and performance in dynamic scenarios.
Key Takeaways
- S2Q learns multiple sub-value functions to retain high-value actions.
- The method encourages persistent exploration, improving adaptability.
- Experiments show S2Q outperforms existing MARL algorithms.
- Retaining suboptimal actions allows for better adjustment to shifting optima.
- The approach is relevant for complex multi-agent environments.
Computer Science > Artificial Intelligence arXiv:2602.17062 (cs) [Submitted on 19 Feb 2026] Title:Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning Authors:Yonghyeon Jo, Sunwoo Lee, Seungyul Han View a PDF of the paper titled Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning, by Yonghyeon Jo and 2 other authors View PDF HTML (experimental) Abstract:Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often converging to suboptimal policies. To address this limitation, we propose Successive Sub-value Q-learning (S2Q), which learns multiple sub-value functions to retain alternative high-value actions. Incorporating these sub-value functions into a Softmax-based behavior policy, S2Q encourages persistent exploration and enables $Q^{\text{tot}}$ to adjust quickly to the changing optima. Experiments on challenging MARL benchmarks confirm that S2Q consistently outperforms various MARL algorithms, demonstrating improved adaptability and overall performance. Our code is available at this https URL. Comments: Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.17062 [cs.AI] (or arXiv:2602.17062v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.17062 Focus to learn more arXiv-issued D...