Ai Startups Machine Learning Ai Agents Data Science

[2602.12903] Contextual Online Bilateral Trade

arXiv - Machine Learning February 16, 2026 4 min read Article

Summary

This paper explores contextual online bilateral trade, focusing on how agents' valuations depend on context vectors. It presents algorithms achieving low regret for trade gain and profit maximization under different feedback models.

Why It Matters

Understanding contextual online bilateral trade is crucial for developing efficient algorithms in economic and machine learning applications. The findings can enhance decision-making processes in dynamic trading environments, impacting both theoretical and practical aspects of game theory and economics.

Key Takeaways

The study introduces algorithms that achieve low regret in bilateral trade scenarios.
Two feedback models are analyzed: one-bit and two-bit, each affecting regret differently.
The algorithms maintain budget balance, ensuring no negative profits in the two-bit feedback model.

Computer Science > Computer Science and Game Theory arXiv:2602.12903 (cs) [Submitted on 13 Feb 2026] Title:Contextual Online Bilateral Trade Authors:Romain Cosson, Federico Fusco, Anupam Gupta, Stefano Leonardi, Renato Paes Leme, Matteo Russo View a PDF of the paper titled Contextual Online Bilateral Trade, by Romain Cosson and 5 other authors View PDF HTML (experimental) Abstract:We study repeated bilateral trade when the valuations of the sellers and the buyers are contextual. More precisely, the agents' valuations are given by the inner product of a context vector with two unknown $d$-dimensional vectors -- one for the buyers and one for the sellers. At each time step $t$, the learner receives a context and posts two prices, one for the seller and one for the buyer, and the trade happens if both agents accept their price. We study two objectives for this problem, gain from trade and profit, proving no-regret with respect to a surprisingly strong benchmark: the best omniscient dynamic strategy. In the natural scenario where the learner observes \emph{separately} whether the agents accept their price -- the so-called \emph{two-bit} feedback -- we design algorithms that achieve $O(d\log d)$ regret for gain from trade, and $O(d \log\log T + d\log d)$ regret for profit maximization. Both results are tight, up to the $\log(d)$ factor, and implement per-step budget balance, meaning that the learner never incurs negative profit. In the less informative \emph{one-bit} feedback m...

Read Original Article