[2509.14585] Online reinforcement learning via sparse Gaussian mixture model Q-functions
Summary
This paper presents an innovative online reinforcement learning framework using sparse Gaussian mixture model Q-functions, enhancing exploration and performance with fewer parameters.
Why It Matters
The proposed framework addresses the challenges of overfitting and parameter efficiency in reinforcement learning. By leveraging streaming data and a structured approach, it offers a significant advancement in the field, potentially improving applications in various AI domains.
Key Takeaways
- Introduces a novel online policy-iteration framework for reinforcement learning.
- Utilizes sparse Gaussian mixture model Q-functions to enhance exploration.
- Achieves performance comparable to dense deep RL methods with fewer parameters.
- Regulates model complexity to prevent overfitting while maintaining expressiveness.
- Demonstrates strong generalization in low-parameter-count scenarios.
Computer Science > Machine Learning arXiv:2509.14585 (cs) [Submitted on 18 Sep 2025 (v1), last revised 13 Feb 2026 (this version, v2)] Title:Online reinforcement learning via sparse Gaussian mixture model Q-functions Authors:Minh Vu, Konstantinos Slavakis View a PDF of the paper titled Online reinforcement learning via sparse Gaussian mixture model Q-functions, by Minh Vu and Konstantinos Slavakis View PDF HTML (experimental) Abstract:This paper introduces a structured and interpretable online policy-iteration framework for reinforcement learning (RL), built around the novel class of sparse Gaussian mixture model Q-functions (S-GMM-QFs). Extending earlier work that trained GMM-QFs offline, the proposed framework develops an online scheme that leverages streaming data to encourage exploration. Model complexity is regulated through sparsification by Hadamard overparametrization, which mitigates overfitting while preserving expressiveness. The parameter space of S-GMM-QFs is naturally endowed with a Riemannian manifold structure, allowing for principled parameter updates via online gradient descent on a smooth objective. Numerical tests show that S-GMM-QFs match the performance of dense deep RL (DeepRL) methods on standard benchmarks while using significantly fewer parameters, and maintain strong performance even in low-parameter-count regimes where sparsified DeepRL methods fail to generalize. Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC) Cite as: ar...