[2603.20899] Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach
About this article
Abstract page for arXiv paper 2603.20899: Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach
Computer Science > Computation and Language arXiv:2603.20899 (cs) [Submitted on 21 Mar 2026] Title:Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach Authors:Hongyu Cao, Kunpeng Liu, Dongjie Wang, Yanjie Fu View a PDF of the paper titled Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach, by Hongyu Cao and 3 other authors View PDF HTML (experimental) Abstract:Large language models exhibit strong reasoning capabilities, yet often rely on shortcuts such as surface pattern matching and answer memorization rather than genuine logical inference. We propose Shortcut-Aware Reasoning Training (SART), a gradient-aware framework that detects and mitigates shortcut-promoting samples via ShortcutScore and gradient surgery. Our method identifies shortcut signals through gradient misalignment with validation objectives and answer-token concentration, and modifies training dynamics accordingly. Experiments on controlled reasoning benchmarks show that SART achieves +16.5% accuracy and +40.2% robustness over the strongest baseline, significantly improving generalization under distribution shifts. Code is available at: this https URL. Comments: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2603.20899 [cs.CL] (or arXiv:2603.20899v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2603.20899 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submis...