[2602.18114] Non-Stationary Online Resource Allocation: Learning from a Single Sample

[2602.18114] Non-Stationary Online Resource Allocation: Learning from a Single Sample

arXiv - Machine Learning 4 min read Article

Summary

The paper presents a novel approach to online resource allocation under non-stationary demand, requiring minimal offline data. It introduces a type-dependent quantile-based meta-policy to optimize resource allocation effectively.

Why It Matters

This research addresses the challenges of resource allocation in dynamic environments, which is crucial for applications in machine learning and operations research. By minimizing data requirements while achieving significant performance guarantees, it opens avenues for more efficient algorithms in real-world scenarios.

Key Takeaways

  • Introduces a meta-policy for resource allocation with minimal offline data.
  • Distinguishes between reward-observed and type-only samples for decision-making.
  • Achieves sublinear regret bounds under specific conditions, enhancing existing frameworks.

Computer Science > Machine Learning arXiv:2602.18114 (cs) [Submitted on 20 Feb 2026] Title:Non-Stationary Online Resource Allocation: Learning from a Single Sample Authors:Yiding Feng, Jiashuo Jiang, Yige Wang View a PDF of the paper titled Non-Stationary Online Resource Allocation: Learning from a Single Sample, by Yiding Feng and 2 other authors View PDF HTML (experimental) Abstract:We study online resource allocation under non-stationary demand with a minimum offline data requirement. In this problem, a decision-maker must allocate multiple types of resources to sequentially arriving queries over a finite horizon. Each query belongs to a finite set of types with fixed resource consumption and a stochastic reward drawn from an unknown, type-specific distribution. Critically, the environment exhibits arbitrary non-stationarity -- arrival distributions may shift unpredictably-while the algorithm requires only one historical sample per period to operate effectively. We distinguish two settings based on sample informativeness: (i) reward-observed samples containing both query type and reward realization, and (ii) the more challenging type-only samples revealing only query type information. We propose a novel type-dependent quantile-based meta-policy that decouples the problem into modular components: reward distribution estimation, optimization of target service probabilities via fluid relaxation, and real-time decisions through dynamic acceptance thresholds. For reward-obse...

Related Articles

Llms

One of The Worst AI's I've Ever Seen

I'm using Gemini just for they gave us a student-free-pro pack. It can't see the images I sent, most of the time it just rewrites the mes...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone 👋 I've set up a self-hosted API gateway using New-API to manage and distribute Claude Opus 4.6 access across multiple users....

Reddit - Artificial Intelligence · 1 min ·
Llms

The open-source AI system that beat Claude Sonnet on a $500 GPU just shipped a coding assistant

A week or two ago, an open-source project called ATLAS made the rounds for scoring 74.6% on LiveCodeBench with a frozen 9B model on a sin...

Reddit - Artificial Intelligence · 1 min ·
Google quietly releases an offline-first AI dictation app on iOS | TechCrunch
Machine Learning

Google quietly releases an offline-first AI dictation app on iOS | TechCrunch

Google's new offline-first dictation app uses Gemma AI models to take on the apps like Wispr Flow.

TechCrunch - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime