[2604.01754] LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches
About this article
Abstract page for arXiv paper 2604.01754: LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches
Computer Science > Computation and Language arXiv:2604.01754 (cs) [Submitted on 2 Apr 2026] Title:LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches Authors:Linyang He, Qiyao Yu, Hanze Dong, Baohao Liao, Xinxing Xu, Micah Goldblum, Jiang Bian, Nima Mesgarani View a PDF of the paper titled LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches, by Linyang He and 7 other authors View PDF HTML (experimental) Abstract:Mathematical reasoning is a hallmark of human intelligence, and whether large language models (LLMs) can meaningfully perform it remains a central question in artificial intelligence and cognitive science. As LLMs are increasingly integrated into scientific workflows, rigorous evaluation of their mathematical capabilities becomes a practical necessity. Existing benchmarks are limited by synthetic settings and data contamination. We present LiveMathematicianBench, a dynamic multiple-choice benchmark for research-level mathematical reasoning built from recent arXiv papers published after model training cutoffs. By grounding evaluation in newly published theorems, it provides a realistic testbed beyond memorized patterns. The benchmark introduces a thirteen-category logical taxonomy of theorem types (e.g., implication, equivalence, existence, uniqueness), enabling fine-grained evaluation across reasoning forms. It employs a proof-sketch-guided distractor pipeline that uses high-level...