[2602.18114] Non-Stationary Online Resource Allocation: Learning from a Single Sample
Summary
The paper presents a novel approach to online resource allocation under non-stationary demand, requiring minimal offline data. It introduces a type-dependent quantile-based meta-policy to optimize resource allocation effectively.
Why It Matters
This research addresses the challenges of resource allocation in dynamic environments, which is crucial for applications in machine learning and operations research. By minimizing data requirements while achieving significant performance guarantees, it opens avenues for more efficient algorithms in real-world scenarios.
Key Takeaways
- Introduces a meta-policy for resource allocation with minimal offline data.
- Distinguishes between reward-observed and type-only samples for decision-making.
- Achieves sublinear regret bounds under specific conditions, enhancing existing frameworks.
Computer Science > Machine Learning arXiv:2602.18114 (cs) [Submitted on 20 Feb 2026] Title:Non-Stationary Online Resource Allocation: Learning from a Single Sample Authors:Yiding Feng, Jiashuo Jiang, Yige Wang View a PDF of the paper titled Non-Stationary Online Resource Allocation: Learning from a Single Sample, by Yiding Feng and 2 other authors View PDF HTML (experimental) Abstract:We study online resource allocation under non-stationary demand with a minimum offline data requirement. In this problem, a decision-maker must allocate multiple types of resources to sequentially arriving queries over a finite horizon. Each query belongs to a finite set of types with fixed resource consumption and a stochastic reward drawn from an unknown, type-specific distribution. Critically, the environment exhibits arbitrary non-stationarity -- arrival distributions may shift unpredictably-while the algorithm requires only one historical sample per period to operate effectively. We distinguish two settings based on sample informativeness: (i) reward-observed samples containing both query type and reward realization, and (ii) the more challenging type-only samples revealing only query type information. We propose a novel type-dependent quantile-based meta-policy that decouples the problem into modular components: reward distribution estimation, optimization of target service probabilities via fluid relaxation, and real-time decisions through dynamic acceptance thresholds. For reward-obse...