[2602.21436] Efficient Uncoupled Learning Dynamics with $\tilde{O}\!\left(T^{-1/4}\right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback
Summary
This paper presents an efficient uncoupled learning algorithm for bilinear saddle-point problems, achieving last-iterate convergence with a rate of \(\tilde{O}(T^{-1/4})\) under bandit feedback conditions.
Why It Matters
The research addresses a critical aspect of machine learning dynamics, particularly in multi-agent systems where players receive limited feedback. The proposed algorithm enhances the understanding of convergence in complex environments, which is vital for developing robust learning systems in various applications, including game theory and optimization.
Key Takeaways
- Introduces an uncoupled learning algorithm for bilinear saddle-point problems.
- Achieves last-iterate convergence at a rate of \(\tilde{O}(T^{-1/4})\).
- Utilizes bandit feedback, making it applicable in scenarios with limited information.
- Combines experimental design techniques with the Follow-The-Regularized-Leader framework.
- Computationally efficient, requiring only a linear optimization oracle.
Statistics > Machine Learning arXiv:2602.21436 (stat) [Submitted on 24 Feb 2026] Title:Efficient Uncoupled Learning Dynamics with $\tilde{O}\!\left(T^{-1/4}\right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback Authors:Arnab Maiti, Claire Jie Zhang, Kevin Jamieson, Jamie Heather Morgenstern, Ioannis Panageas, Lillian J. Ratliff View a PDF of the paper titled Efficient Uncoupled Learning Dynamics with $\tilde{O}\!\left(T^{-1/4}\right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback, by Arnab Maiti and 5 other authors View PDF HTML (experimental) Abstract:In this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging setting where players select actions from compact convex sets and receive only bandit feedback. Our main contribution is the design of an uncoupled learning algorithm that guarantees last-iterate convergence to the Nash equilibrium with high probability. We establish a convergence rate of $\tilde{O}(T^{-1/4})$ up to polynomial factors in problem parameters. Crucially, our proposed algorithm is computationally efficient, requiring only an efficient linear optimization oracle over the players' compact action sets. The algorithm is obtained by combining techniques from experimental design and the cla...