Machine Learning Ai Agents

[2510.04407] Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

arXiv - Machine Learning February 18, 2026 4 min read Article

Summary

This paper presents a new scale-invariant variant of predictive regret matching, called IREG-PRM+, which bridges the gap between theoretical and practical convergence rates in zero-sum games.

Why It Matters

The study addresses a long-standing issue in game theory where theoretical models do not align with practical applications. By introducing IREG-PRM+, the authors enhance convergence rates, potentially improving algorithm performance in real-world scenarios, which is crucial for advancements in machine learning and AI.

Key Takeaways

IREG-PRM+ achieves optimal convergence rates in zero-sum games.
The new algorithm closes the gap between theory and practical application.
It demonstrates better performance compared to existing regret matching algorithms.
The paper extends its analysis to variational inequality problems.
The findings could influence future research in adaptive learning rates.

Computer Science > Computer Science and Game Theory arXiv:2510.04407 (cs) [Submitted on 6 Oct 2025 (v1), last revised 17 Feb 2026 (this version, v3)] Title:Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games Authors:Brian Hu Zhang, Ioannis Anagnostides, Tuomas Sandholm View a PDF of the paper titled Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games, by Brian Hu Zhang and 2 other authors View PDF HTML (experimental) Abstract:A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods. Although a convergence rate of $T^{-1}$ has long been established, the most effective paradigm in practice is counterfactual regret minimization (CFR), which is based on regret matching and its modern variants. In particular, the state of the art across most benchmarks is predictive regret matching$^+$ (PRM$^+$). Yet, such algorithms can exhibit slower $T^{-1/2}$ convergence even in self-play. In this paper, we close the gap between theory and practice. We propose a new scale-invariant and parameter-free variant of PRM$^+$, which we call IREG-PRM$^+$. We show that it achieves $T^{-1/2}$ best-iterate and $T^{-1}$ (i.e., optimal) average-iterate convergence guarantees, while also being on par or even better relative to PRM$^+$ on benchmark games. From a technical standpoint, we dr...

Read Original Article