[2603.21357] AgentHER: Hindsight Experience Replay for LLM Agent

[2603.21357] AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

arXiv - AI March 24, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.21357: AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

Computer Science > Artificial Intelligence arXiv:2603.21357 (cs) [Submitted on 22 Mar 2026] Title:AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling Authors:Liang Ding View a PDF of the paper titled AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling, by Liang Ding View PDF HTML (experimental) Abstract:LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER, a framework that recovers this lost training signal by adapting the Hindsight Experience Replay (HER; Andrychowicz et al., 2017) principle to natural-language agent trajectories for offline data augmentation. The key insight is simple: a trajectory that fails goal A is often a correct demonstration for some achievable alternative goal B. AgentHER realises this idea through a four-stage pipeline -- failure classification, outcome extraction, LLM-guided prompt relabeling with confidence gating, and data packaging -- that converts discarded failures into high-quality SFT, DPO, and ShareGPT training data, with both zero-cost rule-based and LLM-judge implementations. On WebArena (Zhou et al., 2024) and ToolBench (Qin et al., 2024), AgentHER improves over success-only SFT by +7.1-11.7 pp across four model families (GPT-4o...

Originally published on March 24, 2026. Curated by AI News.

Llms

Nvidia goes all-in on AI agents while Anthropic pulls the plug

TLDR: Nvidia is partnering with 17 major companies to build a platform specifically for enterprise AI agents, basically trying to become ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch

It’s about to become more expensive for Claude Code subscribers to use Anthropic’s coding assistant with OpenClaw and other third-party t...

TechCrunch - AI · 4 min · about 1 hour ago

Llms

I am seeing Claude everywhere

Every single Instagram reel or TikTok I scroll i see people mentioning Claude and glazing it like it’s some kind of master tool that’s be...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

[2603.21357] AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

About this article

Related Articles

Nvidia goes all-in on AI agents while Anthropic pulls the plug

Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage | TechCrunch

I am seeing Claude everywhere

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

No comments

Stay updated with AI News