[2603.03512] Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

[2603.03512] Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.03512: Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

Computer Science > Computers and Society arXiv:2603.03512 (cs) [Submitted on 3 Mar 2026] Title:Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks Authors:Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn View a PDF of the paper titled Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks, by Danielle S. Fox and 3 other authors View PDF Abstract:Teachers face increasing demands on their time, particularly in adapting mathematics curricula to meet individual student needs while maintaining cognitive rigor. This study evaluates whether AI tools can accurately classify the cognitive demand of mathematical tasks, which is important for creating or adapting tasks that support student learning. We tested eleven AI tools: six general-purpose (ChatGPT, Claude, DeepSeek, Gemini, Grok, Perplexity) and five education-specific (Brisk, Coteach AI, Khanmigo, Magic School, this http URL), on their ability to categorize mathematics tasks across four levels of cognitive demand using a research-based framework. The goal was to approximate the performance teachers will achieve with straightforward prompts. On average, AI tools accurately classified cognitive demand in only 63% of cases. Education-specific tools were not more accurate than general-purpose tools, and no tool exceeded 83% accuracy. All tools struggled with tasks at the extremes of cognitive demand (Memorization and Doing Mathemat...

Originally published on March 05, 2026. Curated by AI News.

Related Articles

Llms

What's your "When Language Model AI can do X, I'll be impressed"?

I have two at the top of my mind: When it can read musical notes. I will be mildly impressed when I can paste in a picture of musical not...

Reddit - Artificial Intelligence · 1 min ·
Google’s Gemini AI can answer your questions with 3D models and simulations
Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

Google's latest upgrade for Gemini will allow the chatbot to generate interactive 3D models and simulations in response to your questions...

The Verge - AI · 4 min ·
Moody’s Integrates AI Agents With Anthropic’s Claude
Llms

Moody’s Integrates AI Agents With Anthropic’s Claude

AI Tools & Products · 4 min ·
AI on the couch: Anthropic gives Claude 20 hours of psychiatry
Llms

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

AI Tools & Products · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime