[2603.02680] LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization
About this article
Abstract page for arXiv paper 2603.02680: LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization
Computer Science > Artificial Intelligence arXiv:2603.02680 (cs) [Submitted on 3 Mar 2026] Title:LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization Authors:Yang Zhao, Zihao Li, Zhiyu Jiang, Dandan Ma, Ganchao Liu, Wenzhe Zhao View a PDF of the paper titled LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization, by Yang Zhao and Zihao Li and Zhiyu Jiang and Dandan Ma and Ganchao Liu and Wenzhe Zhao View PDF HTML (experimental) Abstract:While Large Language Models (LLMs) form the cornerstone of sequential decision-making agent development, they have inherent limitations in high-frequency decision tasks. Existing research mainly focuses on discrete embodied decision scenarios with low-frequency and significant semantic differences in state space (e.g., household planning). These methods suffer from limited performance in high-frequency decision-making tasks, since high-precision numerical state information in such tasks undergoes frequent updates with minimal fluctuations, and exhibiting policy misalignment between the learned sub-tasks and composite tasks. To address these issues, this paper proposes Normalized Action Reward guided Consistency Policy Optimization (NAR-CP). 1) Our method first acquires predefined dense rewards from environmental feedback of candidate actions via reward functions, then completes reward shaping through normalization, and theoretically verifie...