[2604.01754] LiveMathematicianBench: A Live Benchmark for

[2604.01754] LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches

arXiv - Machine Learning April 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.01754: LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches

Computer Science > Computation and Language arXiv:2604.01754 (cs) [Submitted on 2 Apr 2026] Title:LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches Authors:Linyang He, Qiyao Yu, Hanze Dong, Baohao Liao, Xinxing Xu, Micah Goldblum, Jiang Bian, Nima Mesgarani View a PDF of the paper titled LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches, by Linyang He and 7 other authors View PDF HTML (experimental) Abstract:Mathematical reasoning is a hallmark of human intelligence, and whether large language models (LLMs) can meaningfully perform it remains a central question in artificial intelligence and cognitive science. As LLMs are increasingly integrated into scientific workflows, rigorous evaluation of their mathematical capabilities becomes a practical necessity. Existing benchmarks are limited by synthetic settings and data contamination. We present LiveMathematicianBench, a dynamic multiple-choice benchmark for research-level mathematical reasoning built from recent arXiv papers published after model training cutoffs. By grounding evaluation in newly published theorems, it provides a realistic testbed beyond memorized patterns. The benchmark introduces a thirteen-category logical taxonomy of theorem types (e.g., implication, equivalence, existence, uniqueness), enabling fine-grained evaluation across reasoning forms. It employs a proof-sketch-guided distractor pipeline that uses high-level...

Originally published on April 03, 2026. Curated by AI News.

Llms

Earnestly using Claude to create a shared drive hierarchy and manual maintenance plan = LOL

On a less serious (but perhaps profound?) note: Some guys I know recently decided to use AI for the first time in their lives, while sett...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

OpenAI now lets teams make custom bots that can do work on their own | The Verge

OpenAI is bringing “workspace” AI agents to users of its Business, Enterprise, Edu, and Teachers plans that can perform business tasks in...

The Verge - AI · 4 min · about 4 hours ago

Llms

My Unsupervised Compliance Layer Project

A bit of context, my work has been mostly around building agentic pipelines. I really love the craft. My latest side project was a delibe...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

I’m 17 and built an AI that flirts, remembers you, watches your shows, and replies to your reels…

V3 is done and it’s getting… weird. This thing now: auto-replies to DMs with tone adjustment reads images, transcribes voice notes, repli...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

[2604.01754] LiveMathematicianBench: A Live Benchmark for Mathematician-Level Reasoning with Proof Sketches

About this article

Related Articles

Earnestly using Claude to create a shared drive hierarchy and manual maintenance plan = LOL

OpenAI now lets teams make custom bots that can do work on their own | The Verge

My Unsupervised Compliance Layer Project

I’m 17 and built an AI that flirts, remembers you, watches your shows, and replies to your reels…

No comments

Stay updated with AI News