[2603.25633] Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?
About this article
Abstract page for arXiv paper 2603.25633: Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?
Computer Science > Artificial Intelligence arXiv:2603.25633 (cs) [Submitted on 26 Mar 2026] Title:Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance? Authors:Liang Zhang, Yu Fu, Xinyi Jin View a PDF of the paper titled Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?, by Liang Zhang and 2 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are increasingly used in math education not only as problem solvers but also as assessors of learners' reasoning. However, it remains unclear whether stronger math problem-solving ability is associated with stronger step-level assessment performance. This study examines that relationship using the GSM8K and MATH subsets of PROCESSBENCH, a human-annotated benchmark for identifying the earliest erroneous step in mathematical reasoning. We evaluate two LLM-based math tutor agent settings, instantiated with GPT-4 and GPT-5, in two independent tasks on the same math problems: solving the original problem and assessing a benchmark-provided solution by predicting the earliest erroneous step. Results show a consistent within-model pattern: assessment accuracy is substantially higher on math problem items the same model solved correctly than on items it solved incorrectly, with statistically significant associations across both models and datasets. At the same time, assessment remains more difficult than di...