[2502.07397] Linear Bandits beyond Inner Product Spaces, the case of Bandit Optimal Transport
Summary
This paper explores linear bandits beyond traditional inner product spaces, focusing on optimal transport problems. It proposes a refined algorithm that maintains efficient learning without relying on inner product structures.
Why It Matters
Understanding linear bandits in the context of optimal transport expands the applicability of online learning algorithms. This research could enhance decision-making processes in various fields, including recommendation systems and adaptive clinical trials, by providing more flexible frameworks for learning.
Key Takeaways
- The study challenges the reliance on inner product structures in linear bandits.
- A new algorithm is proposed that embeds action sets into Hilbertian subspaces.
- The algorithm achieves similar regret bounds as the classical OFUL algorithm.
- The research highlights the importance of optimal transport in online learning.
- Results indicate that the algorithm's regret can vary based on the cost function's regularity.
Statistics > Machine Learning arXiv:2502.07397 (stat) [Submitted on 11 Feb 2025 (v1), last revised 17 Feb 2026 (this version, v2)] Title:Linear Bandits beyond Inner Product Spaces, the case of Bandit Optimal Transport Authors:Lorenzo Croissant (CREST, FAIRPLAY, ENSAE Paris) View a PDF of the paper titled Linear Bandits beyond Inner Product Spaces, the case of Bandit Optimal Transport, by Lorenzo Croissant (CREST and 2 other authors View PDF Abstract:Linear bandits have long been a central topic in online learning, with applications ranging from recommendation systems to adaptive clinical trials. Their general learnability has been established when the objective is to minimise the inner product between a cost parameter and the decision variable. While this is highly general, this reliance on an inner product structure belies the name of \emph{linear} bandits, and fails to account for problems such as Optimal Transport. Using the Kantorovich formulation of Optimal Transport as an example, we show that an inner product structure is \emph{not} necessary to achieve efficient learning in linear bandits. We propose a refinement of the classical OFUL algorithm that operates by embedding the action set into a Hilbertian subspace, where confidence sets can be built via least-squares estimation. Actions are then constrained to this subspace by penalising optimism. The analysis is completed by leveraging convergence results from penalised (entropic) transport to the Kantorovich proble...