[2603.01309] PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure
About this article
Abstract page for arXiv paper 2603.01309: PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure
Computer Science > Machine Learning arXiv:2603.01309 (cs) [Submitted on 1 Mar 2026] Title:PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure Authors:Joshua Steier View a PDF of the paper titled PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure, by Joshua Steier View PDF HTML (experimental) Abstract:When data is scarce or mistakes are costly, average-case metrics fall short. What a practitioner needs is a guarantee: with probability at least $1-\delta$, the learned policy is $\varepsilon$-close to optimal after $N$ episodes. This is the PAC promise, and between 2018 and 2025 the RL theory community made striking progress on when such promises can be kept. We survey that progress. Our organizing tool is the Coverage-Structure-Objective (CSO) framework, proposed here, which decomposes nearly every PAC sample complexity result into three factors: coverage (how data were obtained), structure (intrinsic MDP or function-class complexity), and objective (what the learner must deliver). CSO is not a theorem but an interpretive template that identifies bottlenecks and makes cross-setting comparison immediate. The technical core covers tight tabular baselines and the uniform-PAC bridge to regret; structural complexity measures (Bellman rank, witness rank, Bellman-Eluder dimension) governing learnability with function approximation; results for linear, kernel/NTK, and low-rank models; reward-free exploration as upf...