[2509.24156] Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

[2509.24156] Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2509.24156: Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

Computer Science > Artificial Intelligence arXiv:2509.24156 (cs) [Submitted on 29 Sep 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models Authors:Yuhui Wang, Changjiang Li, Guangke Chen, Jiacheng Liang, Ting Wang View a PDF of the paper titled Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models, by Yuhui Wang and 4 other authors View PDF HTML (experimental) Abstract:Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval. To test this hypothesis, we conduct controlled experiments that challenge LRMs with misleading cues during reasoning and/or corrupted answers during retrieval. Our results across models and datasets confirm that both mechanisms operate simultaneously, with their relative dominance influenced by multiple factors: problem domains, model scales, and fine-tuning approaches (e.g., reinforcement learning vs. distillation). The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively "hacking" the reward signal and undermining genuine reasoning d...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

[2603.14841] Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling
Machine Learning

[2603.14841] Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Abstract page for arXiv paper 2603.14841: Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

arXiv - AI · 4 min ·
[2603.17839] How do LLMs Compute Verbal Confidence
Llms

[2603.17839] How do LLMs Compute Verbal Confidence

Abstract page for arXiv paper 2603.17839: How do LLMs Compute Verbal Confidence

arXiv - AI · 4 min ·
[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models
Llms

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Abstract page for arXiv paper 2603.15970: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight...

arXiv - AI · 4 min ·
[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting
Llms

[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

Abstract page for arXiv paper 2603.09085: Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum ...

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime