[2509.24156] Reasoning or Retrieval? A Study of Answer Attribution on

[2509.24156] Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

arXiv - AI March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2509.24156: Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

Computer Science > Artificial Intelligence arXiv:2509.24156 (cs) [Submitted on 29 Sep 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models Authors:Yuhui Wang, Changjiang Li, Guangke Chen, Jiacheng Liang, Ting Wang View a PDF of the paper titled Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models, by Yuhui Wang and 4 other authors View PDF HTML (experimental) Abstract:Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval. To test this hypothesis, we conduct controlled experiments that challenge LRMs with misleading cues during reasoning and/or corrupted answers during retrieval. Our results across models and datasets confirm that both mechanisms operate simultaneously, with their relative dominance influenced by multiple factors: problem domains, model scales, and fine-tuning approaches (e.g., reinforcement learning vs. distillation). The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively "hacking" the reward signal and undermining genuine reasoning d...

Originally published on March 03, 2026. Curated by AI News.

Machine Learning

[2603.14841] Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

Abstract page for arXiv paper 2603.14841: Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.17839] How do LLMs Compute Verbal Confidence

Abstract page for arXiv paper 2603.17839: How do LLMs Compute Verbal Confidence

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Abstract page for arXiv paper 2603.15970: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight...

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

Abstract page for arXiv paper 2603.09085: Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum ...

arXiv - AI · 4 min · about 1 hour ago

[2509.24156] Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

About this article

Related Articles

[2603.14841] Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling

[2603.17839] How do LLMs Compute Verbal Confidence

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

[2603.09085] Not All News Is Equal: Topic- and Event-Conditional Sentiment from Finetuned LLMs for Aluminum Price Forecasting

No comments

Stay updated with AI News