[2508.11143] Actor-Critic for Continuous Action Chunks: A

[2508.11143] Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward

arXiv - AI March 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2508.11143: Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward

Computer Science > Robotics arXiv:2508.11143 (cs) [Submitted on 15 Aug 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward Authors:Jiarui Yang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang View a PDF of the paper titled Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward, by Jiarui Yang and 3 other authors View PDF HTML (experimental) Abstract:Existing reinforcement learning (RL) methods struggle with long-horizon robotic manipulation tasks, particularly those involving sparse rewards. While action chunking is a promising paradigm for robotic manipulation, using RL to directly learn continuous action chunks in a stable and data-efficient manner remains a critical challenge. This paper introduces AC3 (Actor-Critic for Continuous Chunks), a novel RL framework that learns to generate high-dimensional, continuous action sequences. To make this learning process stable and data-efficient, AC3 incorporates targeted stabilization mechanisms for both the actor and the critic. First, to ensure reliable policy improvement, the actor is trained with an asymmetric update rule, learning exclusively from successful trajectories. Second, to enable effective value learning despite sparse rewards, the critic's update is stabilized using intra-chunk $n$-step returns and furt...

Originally published on March 02, 2026. Curated by AI News.

Machine Learning

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

Abstract page for arXiv paper 2601.07855: RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

arXiv - AI · 3 min · about 6 hours ago

Llms

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

Abstract page for arXiv paper 2502.00262: INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Ha...

arXiv - AI · 4 min · about 6 hours ago

Llms

[2508.00500] ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

Abstract page for arXiv paper 2508.00500: ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

arXiv - AI · 4 min · about 6 hours ago

Robotics

[2603.26660] Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning

Abstract page for arXiv paper 2603.26660: Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning

arXiv - AI · 4 min · about 6 hours ago

[2508.11143] Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward

About this article

Related Articles

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

[2508.00500] ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

[2603.26660] Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning

No comments

Stay updated with AI News