[2511.11828] Conformal Constrained Policy Optimization for Cost-Effective LLM Agents
About this article
Abstract page for arXiv paper 2511.11828: Conformal Constrained Policy Optimization for Cost-Effective LLM Agents
Computer Science > Machine Learning arXiv:2511.11828 (cs) [Submitted on 14 Nov 2025 (v1), last revised 23 Mar 2026 (this version, v2)] Title:Conformal Constrained Policy Optimization for Cost-Effective LLM Agents Authors:Wenwen Si, Sooyong Jang, Insup Lee, Osbert Bastani View a PDF of the paper titled Conformal Constrained Policy Optimization for Cost-Effective LLM Agents, by Wenwen Si and 3 other authors View PDF HTML (experimental) Abstract:While large language models (LLMs) have recently made tremendous progress towards solving challenging AI problems, they have done so at increasingly steep computational and API costs. We propose a novel strategy where we combine multiple LLM models with varying cost/accuracy tradeoffs in an agentic manner, where models and tools are run in sequence as determined by an orchestration model to minimize cost subject to a user-specified level of reliability; this constraint is formalized using conformal prediction to provide guarantees. To solve this problem, we propose Conformal Constrained Policy Optimization (CCPO), a training paradigm that integrates constrained policy optimization with off-policy reinforcement learning and recent advances in online conformal prediction. CCPO jointly optimizes a cost-aware policy (score function) and an adaptive threshold. Across two multi-hop question answering benchmarks, CCPO achieves up to a 30% cost reduction compared to other cost-aware baselines and LLM-guided methods without compromising reliab...