[2512.16917] Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
About this article
Abstract page for arXiv paper 2512.16917: Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
Computer Science > Artificial Intelligence arXiv:2512.16917 (cs) [Submitted on 18 Dec 2025 (v1), last revised 25 Mar 2026 (this version, v3)] Title:Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning Authors:Qihao Liu, Luoxin Ye, Wufei Ma, Yu-Cheng Chou, Alan Yuille View a PDF of the paper titled Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning, by Qihao Liu and 4 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) with explicit reasoning capabilities excel at mathematical reasoning yet still commit process errors, such as incorrect calculations, brittle logic, and superficially plausible but invalid steps. In this paper, we introduce Generative Adversarial Reasoner, an on-policy joint training framework designed to enhance reasoning by co-evolving an LLM reasoner and an LLM-based discriminator through adversarial reinforcement learning. A compute-efficient review schedule partitions each reasoning chain into logically complete slices of comparable length, and the discriminator evaluates each slice's soundness with concise, structured justifications. Learning couples complementary signals: the LLM reasoner is rewarded for logically consistent steps that yield correct answers, while the discriminator earns rewards for correctly detecting errors or distinguishing traces in the reasoning process. This produces dense, well-calibrated, on-policy step-level r...