[2601.21214] Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models

[2601.21214] Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2601.21214: Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models

Computer Science > Computation and Language arXiv:2601.21214 (cs) [Submitted on 29 Jan 2026 (v1), last revised 1 May 2026 (this version, v2)] Title:Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models Authors:Zhaoyi Li, Jiatong Li, Gangwei Jiang, Linqi Song, Defu Lian, Ying Wei View a PDF of the paper titled Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models, by Zhaoyi Li and 4 other authors View PDF Abstract:Chain-of-thought (CoT) reasoning has become the standard paradigm for enabling Large Language Models (LLMs) to solve complex problems. However, recent studies reveal a sharp performance drop in reasoning hop generalization scenarios, where the required number of reasoning steps exceeds training distributions while the underlying algorithm remains unchanged. The internal mechanisms driving this failure remain poorly understood. In this work, we conduct a systematic study on tasks from multiple domains, and find that errors concentrate at token positions of a few critical error types, rather than being uniformly distributed. Closer inspection reveals that these token-level erroneous predictions stem from internal competition mechanisms: certain attention heads, termed erroneous processing heads (ep heads), tip the balance by amplifying incorrect reasoning trajectories while suppressing correct ones. Notably, removing individual ep heads during infere...

Originally published on May 04, 2026. Curated by AI News.

Related Articles

Llms

The recursive self, explained

looking for anyone to give any critiques or tell me that something here is incorrect. this is the work of a year how I scaffold on a true...

Reddit - Artificial Intelligence · 1 min ·
Llms

Excellent discussion about LLM scaling [D]

I came across an excellent in depth discussion of memory and compute scaling analysis for LLMs. One takeaway is that running LLMs locally...

Reddit - Machine Learning · 1 min ·
[2602.03216] Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
Llms

[2602.03216] Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

Abstract page for arXiv paper 2602.03216: Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection

arXiv - Machine Learning · 4 min ·
[2510.23557] Minimizing Human Intervention in Online Classification
Llms

[2510.23557] Minimizing Human Intervention in Online Classification

Abstract page for arXiv paper 2510.23557: Minimizing Human Intervention in Online Classification

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime