[2603.02775] From Solver to Tutor: Evaluating the Pedagogical

[2603.02775] From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

arXiv - Machine Learning March 04, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.02775: From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

Computer Science > Computation and Language arXiv:2603.02775 (cs) [Submitted on 3 Mar 2026] Title:From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench Authors:Weikang Shi, Houxing Ren, Junting Pan, Aojun Zhou, Ke Wang, Zimu Lu, Yunqiao Yang, Yuxuan Hu, Linda Wei, Mingjie Zhan, Hongsheng Li View a PDF of the paper titled From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench, by Weikang Shi and 10 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) show significant potential in AI mathematical tutoring, yet current evaluations often rely on simplistic metrics or narrow pedagogical scenarios, failing to assess comprehensive, multi-turn teaching effectiveness. In this paper, we introduce KMP-Bench, a comprehensive K-8 Mathematical Pedagogical Benchmark designed to assess LLMs from two complementary perspectives. The first module, KMP-Dialogue, evaluates holistic pedagogical capabilities against six core principles (e.g., Challenge, Explanation, Feedback), leveraging a novel multi-turn dialogue dataset constructed by weaving together diverse pedagogical components. The second module, KMP-Skills, provides a granular assessment of foundational tutoring abilities, including multi-turn problem-solving, error detection and correction, and problem generation. Our evaluations on KMP-Bench reveal a key disparity: while leading LLMs excel at tasks with verifiable solutions, they struggle with the...

Originally published on March 04, 2026. Curated by AI News.

Llms

TRACER: Learn-to-Defer for LLM Classification with Formal Teacher-Agreement Guarantees

I'm releasing TRACER (Trace-Based Adaptive Cost-Efficient Routing), a library for learning cost-efficient routing policies from LLM trace...

Reddit - Machine Learning · 1 min · 12 minutes ago

Llms

Mistral AI raises $830M in debt to set up a data center near Paris | TechCrunch

Mistral aims to start operating the data center by the second quarter of 2026.

TechCrunch - AI · 4 min · 12 minutes ago

Llms

The Rationing: AI companies are using the "subsidize, addict, extract" playbook — and developers are the product

Anthropic just ran the classic platform playbook on developers: offer generous limits to build dependency, then tighten the screws once t...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

[2603.02775] From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

About this article

Related Articles

TRACER: Learn-to-Defer for LLM Classification with Formal Teacher-Agreement Guarantees

Mistral AI raises $830M in debt to set up a data center near Paris | TechCrunch

The Rationing: AI companies are using the "subsidize, addict, extract" playbook — and developers are the product

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

No comments

Stay updated with AI News