[2602.19805] Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing
Summary
The paper presents Decision MetaMamba, an innovative approach to enhance selective sequence mixing in offline reinforcement learning (RL), addressing limitations of existing models.
Why It Matters
This research is significant as it proposes a new structure that improves performance in offline RL tasks, which are crucial for real-world applications. By addressing the shortcomings of current models, it opens avenues for more efficient and effective RL systems, potentially impacting various fields including robotics and AI safety.
Key Takeaways
- Decision MetaMamba (DMM) improves selective sequence mixing in offline RL.
- DMM replaces the token mixer with a dense layer-based mixer for better information retention.
- Extensive experiments show DMM achieves state-of-the-art performance across diverse RL tasks.
- The model maintains a compact parameter footprint, enhancing its real-world applicability.
- DMM addresses key limitations of previous Mamba-based models.
Computer Science > Machine Learning arXiv:2602.19805 (cs) [Submitted on 23 Feb 2026] Title:Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing Authors:Wall Kim, Chaeyoung Song, Hanul Kim View a PDF of the paper titled Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing, by Wall Kim and 2 other authors View PDF HTML (experimental) Abstract:Mamba-based models have drawn much attention in offline RL. However, their selective mechanism often detrimental when key steps in RL sequences are omitted. To address these issues, we propose a simple yet effective structure, called Decision MetaMamba (DMM), which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information. By performing sequence mixing that considers all channels simultaneously before Mamba, DMM prevents information loss due to selective scanning and residual gating. Extensive experiments demonstrate that our DMM delivers the state-of-the-art performance across diverse RL tasks. Furthermore, DMM achieves these results with a compact parameter footprint, demonstrating strong potential for real-world applications. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.19805 [cs.LG] (or arXiv:2602.19805v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.19805 Focus to learn more arXiv-issued DOI via DataCite (pending regi...