[2602.18114] Non-Stationary Online Resource Allocation: Learning from a Single Sample

arXiv - Machine Learning February 23, 2026 4 min read Article

Summary

The paper presents a novel approach to online resource allocation under non-stationary demand, requiring minimal offline data. It introduces a type-dependent quantile-based meta-policy to optimize resource allocation effectively.

Why It Matters

This research addresses the challenges of resource allocation in dynamic environments, which is crucial for applications in machine learning and operations research. By minimizing data requirements while achieving significant performance guarantees, it opens avenues for more efficient algorithms in real-world scenarios.

Key Takeaways

Introduces a meta-policy for resource allocation with minimal offline data.
Distinguishes between reward-observed and type-only samples for decision-making.
Achieves sublinear regret bounds under specific conditions, enhancing existing frameworks.

Computer Science > Machine Learning arXiv:2602.18114 (cs) [Submitted on 20 Feb 2026] Title:Non-Stationary Online Resource Allocation: Learning from a Single Sample Authors:Yiding Feng, Jiashuo Jiang, Yige Wang View a PDF of the paper titled Non-Stationary Online Resource Allocation: Learning from a Single Sample, by Yiding Feng and 2 other authors View PDF HTML (experimental) Abstract:We study online resource allocation under non-stationary demand with a minimum offline data requirement. In this problem, a decision-maker must allocate multiple types of resources to sequentially arriving queries over a finite horizon. Each query belongs to a finite set of types with fixed resource consumption and a stochastic reward drawn from an unknown, type-specific distribution. Critically, the environment exhibits arbitrary non-stationarity -- arrival distributions may shift unpredictably-while the algorithm requires only one historical sample per period to operate effectively. We distinguish two settings based on sample informativeness: (i) reward-observed samples containing both query type and reward realization, and (ii) the more challenging type-only samples revealing only query type information. We propose a novel type-dependent quantile-based meta-policy that decouples the problem into modular components: reward distribution estimation, optimization of target service probabilities via fluid relaxation, and real-time decisions through dynamic acceptance thresholds. For reward-obse...

Read Original Article

[2602.18114] Non-Stationary Online Resource Allocation: Learning from a Single Sample

Summary

Why It Matters

Key Takeaways

Related Articles

One of The Worst AI's I've Ever Seen

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

The open-source AI system that beat Claude Sonnet on a $500 GPU just shipped a coding assistant

Google quietly releases an offline-first AI dictation app on iOS | TechCrunch

No comments

Stay updated with AI News