[2602.18806] Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models
Summary
The paper presents a metacognitive framework for Large Language Models (LLMs) that enhances their reasoning capabilities by integrating psychological principles, leading to improved self-correction and error diagnosis.
Why It Matters
This research addresses the limitations of LLMs in self-monitoring and error correction, proposing a structured approach that could lead to more reliable AI systems. By grounding AI reasoning in cognitive theory, it opens pathways for developing transparent and robust AI applications.
Key Takeaways
- Introduces a metacognitive framework for LLMs based on cognitive theory.
- Demonstrates significant improvements in self-correction rates and error diagnosis.
- Achieves 84% preference in human evaluations for trustworthiness over standard models.
- Utilizes a dual-process MetaController for adaptive effort allocation.
- Highlights the importance of psychological principles in AI development.
Computer Science > Computation and Language arXiv:2602.18806 (cs) [Submitted on 21 Feb 2026] Title:Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models Authors:Abraham Paul Elenjical, Vivek Hruday Kavuri, Vasudeva Varma View a PDF of the paper titled Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models, by Abraham Paul Elenjical and 2 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) demonstrate strong reasoning performance, yet their ability to reliably monitor, diagnose, and correct their own errors remains limited. We introduce a psychologically grounded metacognitive framework that operationalizes Ann Brown's regulatory cycle (Planning, Monitoring, and Evaluation) as a structured prompting architecture, and study its integration within a lightweight dual-process MetaController for adaptive effort allocation. Across diverse reasoning and diagnostic benchmarks (GSM8K, CRUXEval, MBPP, AIME, CorrectBench, and TruthfulQA) using Llama-3 and Qwen-3 (8B), explicit regulatory structuring substantially improves error diagnosis and yields a threefold increase in successful self-correction. Blinded human evaluations over 580 query pairs show an 84% aggregate preference for trustworthiness and metacognitive self-awareness over standard and Chain-of-Thought baselines. Grounding LLM reasoning in established cognitive theory offers a principled path toward more transparent and diagnostically robust AI systems. Subjects...