[2603.03512] Baseline Performance of AI Tools in Classifying Cognitive

[2603.03512] Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

arXiv - AI March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.03512: Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

Computer Science > Computers and Society arXiv:2603.03512 (cs) [Submitted on 3 Mar 2026] Title:Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks Authors:Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn View a PDF of the paper titled Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks, by Danielle S. Fox and 3 other authors View PDF Abstract:Teachers face increasing demands on their time, particularly in adapting mathematics curricula to meet individual student needs while maintaining cognitive rigor. This study evaluates whether AI tools can accurately classify the cognitive demand of mathematical tasks, which is important for creating or adapting tasks that support student learning. We tested eleven AI tools: six general-purpose (ChatGPT, Claude, DeepSeek, Gemini, Grok, Perplexity) and five education-specific (Brisk, Coteach AI, Khanmigo, Magic School, this http URL), on their ability to categorize mathematics tasks across four levels of cognitive demand using a research-based framework. The goal was to approximate the performance teachers will achieve with straightforward prompts. On average, AI tools accurately classified cognitive demand in only 63% of cases. Education-specific tools were not more accurate than general-purpose tools, and no tool exceeded 83% accuracy. All tools struggled with tasks at the extremes of cognitive demand (Memorization and Doing Mathemat...

Originally published on March 05, 2026. Curated by AI News.

Llms

What's your "When Language Model AI can do X, I'll be impressed"?

I have two at the top of my mind: When it can read musical notes. I will be mildly impressed when I can paste in a picture of musical not...

Reddit - Artificial Intelligence · 1 min · 2 minutes ago

Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

Google's latest upgrade for Gemini will allow the chatbot to generate interactive 3D models and simulations in response to your questions...

The Verge - AI · 4 min · about 6 hours ago

Llms

Moody’s Integrates AI Agents With Anthropic’s Claude

AI Tools & Products · 4 min · about 6 hours ago

Llms

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

AI Tools & Products · 6 min · about 6 hours ago

[2603.03512] Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

About this article

Related Articles

What's your "When Language Model AI can do X, I'll be impressed"?

Google’s Gemini AI can answer your questions with 3D models and simulations

Moody’s Integrates AI Agents With Anthropic’s Claude

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

No comments

Stay updated with AI News