[2603.03332] Fragile Thoughts: How Large Language Models Handle

[2603.03332] Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

arXiv - Machine Learning March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.03332: Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

Computer Science > Computation and Language arXiv:2603.03332 (cs) [Submitted on 11 Feb 2026] Title:Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations Authors:Ashwath Vaithinathan Aravindan, Mayank Kejriwal View a PDF of the paper titled Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations, by Ashwath Vaithinathan Aravindan and 1 other authors View PDF HTML (experimental) Abstract:Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning steps remains poorly understood. This paper presents a comprehensive empirical evaluation of LLM robustness to a structured taxonomy of 5 CoT perturbation types: \textit{MathError, UnitConversion, Sycophancy, SkippedSteps,} and \textit{ExtraSteps}. We evaluate 13 models spanning three orders of magnitude in parameter count (3B to 1.5T\footnote{Assumed parameter count of closed models}), testing their ability to complete mathematical reasoning tasks despite perturbations injected at different points in the reasoning chain. Our key findings reveal heterogeneous vulnerability patterns: MathError perturbations produce the most severe degradation in small models (50-60\% accuracy loss) but show strong scaling benefits; UnitConversion remains challenging across all scales (20-30\% loss even for largest models); ExtraSteps incur minimal acc...

Originally published on March 05, 2026. Curated by AI News.

Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Why are we blindly trusting AI companies with our data?

Lately I’ve been seeing a story floating around that really made me pause. Apparently, there were claims that the US government asked Ant...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

Artificial intelligence is transforming every corner of industry, and television is no exception. Major networks in Korea have recently a...

AI Tools & Products · 4 min · about 4 hours ago

Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min · about 5 hours ago

[2603.03332] Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

About this article

Related Articles

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Why are we blindly trusting AI companies with our data?

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

[2603.16629] MLLM-based Textual Explanations for Face Comparison

No comments

Stay updated with AI News