[2603.03538] Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs
About this article
Abstract page for arXiv paper 2603.03538: Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs
Computer Science > Machine Learning arXiv:2603.03538 (cs) [Submitted on 3 Mar 2026] Title:Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs Authors:Maria-Florina Balcan, Avrim Blum, Kiriaki Fragkia, Zhiyuan Li, Dravyansh Sharma View a PDF of the paper titled Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs, by Maria-Florina Balcan and 4 other authors View PDF HTML (experimental) Abstract:Large language models with chain-of-thought generation have demonstrated great potential for producing complex mathematical proofs. However, their reasoning can often go astray, leading to increasing interest in formal and learned verifiers. A major challenge in learning verifiers, especially when their output will be used by the prover, is that this feedback loop may produce substantial distribution shift. Motivated by this challenge, we propose an online learning framework for learning chain-of-thought verifiers that, given a problem and a sequence of reasoning steps, check the correctness of the solution. Highlighting the asymmetric role of soundness (failure in catching errors in a proof) and completeness (flagging correct proofs as wrong) mistakes of the verifier, we introduce novel extensions of the Littlestone dimension which tightly characterize the mistake bounds for learning a verifier in the realizable setting. We provide optimal algorithms for finding the Pareto-frontier (the smallest total number of...