[2602.17062] Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

[2602.17062] Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

arXiv - AI 3 min read Article

Summary

This paper presents Successive Sub-value Q-learning (S2Q), a novel approach in multi-agent reinforcement learning (MARL) that retains suboptimal actions to adapt to shifting optima, enhancing performance and exploration.

Why It Matters

As multi-agent systems become increasingly prevalent, the ability to adapt to changing environments is crucial. S2Q addresses the limitations of existing MARL methods by enabling agents to retain and explore multiple high-value actions, leading to improved adaptability and performance in dynamic scenarios.

Key Takeaways

  • S2Q learns multiple sub-value functions to retain high-value actions.
  • The method encourages persistent exploration, improving adaptability.
  • Experiments show S2Q outperforms existing MARL algorithms.
  • Retaining suboptimal actions allows for better adjustment to shifting optima.
  • The approach is relevant for complex multi-agent environments.

Computer Science > Artificial Intelligence arXiv:2602.17062 (cs) [Submitted on 19 Feb 2026] Title:Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning Authors:Yonghyeon Jo, Sunwoo Lee, Seungyul Han View a PDF of the paper titled Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning, by Yonghyeon Jo and 2 other authors View PDF HTML (experimental) Abstract:Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often converging to suboptimal policies. To address this limitation, we propose Successive Sub-value Q-learning (S2Q), which learns multiple sub-value functions to retain alternative high-value actions. Incorporating these sub-value functions into a Softmax-based behavior policy, S2Q encourages persistent exploration and enables $Q^{\text{tot}}$ to adjust quickly to the changing optima. Experiments on challenging MARL benchmarks confirm that S2Q consistently outperforms various MARL algorithms, demonstrating improved adaptability and overall performance. Our code is available at this https URL. Comments: Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.17062 [cs.AI]   (or arXiv:2602.17062v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2602.17062 Focus to learn more arXiv-issued D...

Related Articles

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED
Llms

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The AI lab's Project Glasswing will bring together Apple, Google, and more than 45 other organizations. They'll use the new Claude Mythos...

Wired - AI · 7 min ·
Machine Learning

[for hire] Open for contracts – Veteran Data Scientist (AI / ML / OR) focused on delivering real‑world solutions.

Hi Reddit, I've spent 20 years working with data, and I've learned how to crack problems that AI systems struggle with. I've got a knack ...

Reddit - ML Jobs · 1 min ·
Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] ICML final justification

Do we get notified if any reviewer put their final justification into their original review comment? submitted by /u/tuejan11 [link] [com...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime