[2602.23974] Pessimistic Auxiliary Policy for Offline Reinforcement

[2602.23974] Pessimistic Auxiliary Policy for Offline Reinforcement Learning

arXiv - AI March 02, 2026 3 min read

About this article

Abstract page for arXiv paper 2602.23974: Pessimistic Auxiliary Policy for Offline Reinforcement Learning

Computer Science > Artificial Intelligence arXiv:2602.23974 (cs) [Submitted on 27 Feb 2026] Title:Pessimistic Auxiliary Policy for Offline Reinforcement Learning Authors:Fan Zhang, Baoru Huang, Xin Zhang View a PDF of the paper titled Pessimistic Auxiliary Policy for Offline Reinforcement Learning, by Fan Zhang and 2 other authors View PDF HTML (experimental) Abstract:Offline reinforcement learning aims to learn an agent from pre-collected datasets, avoiding unsafe and inefficient real-time interaction. However, inevitable access to out-ofdistribution actions during the learning process introduces approximation errors, causing the error accumulation and considerable overestimation. In this paper, we construct a new pessimistic auxiliary policy for sampling reliable actions. Specifically, we develop a pessimistic auxiliary strategy by maximizing the lower confidence bound of the Q-function. The pessimistic auxiliary strategy exhibits a relatively high value and low uncertainty in the vicinity of the learned policy, avoiding the learned policy sampling high-value actions with potentially high errors during the learning process. Less approximation error introduced by sampled action from pessimistic auxiliary strategy leads to the alleviation of error accumulation. Extensive experiments on offline reinforcement learning benchmarks reveal that utilizing the pessimistic auxiliary strategy can effectively improve the efficacy of other offline RL approaches. Subjects: Artificial I...

Originally published on March 02, 2026. Curated by AI News.

Data Science

Mantis Biotech is making 'digital twins' of humans to help solve medicine's data availability problem | TechCrunch

Mantis takes disparate sources of data to make synthetic datasets that can be used to build so-called "digital twins" of the human body, ...

TechCrunch - AI · 6 min · about 4 hours ago

Nlp

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

I started working on a small coffee coaching app recently - something that could answer questions around brew methods, grind size, extrac...

Reddit - Machine Learning · 1 min · about 5 hours ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 12 hours ago

Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min · about 15 hours ago

[2602.23974] Pessimistic Auxiliary Policy for Offline Reinforcement Learning

About this article

Related Articles

Mantis Biotech is making 'digital twins' of humans to help solve medicine's data availability problem | TechCrunch

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

UMKC Announces New Master of Science in Artificial Intelligence

[2603.16629] MLLM-based Textual Explanations for Face Comparison

No comments

Stay updated with AI News