[2602.24060] Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis
About this article
Abstract page for arXiv paper 2602.24060: Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis
Computer Science > Computation and Language arXiv:2602.24060 (cs) [Submitted on 27 Feb 2026] Title:Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis Authors:Donghao Huang, Zhaoxia Wang View a PDF of the paper titled Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis, by Donghao Huang and Zhaoxia Wang View PDF Abstract:Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across language tasks. We test this claim through a comprehensive evaluation of 504 configurations across seven model families--including adaptive, conditional, and reinforcement learning-based reasoning architectures--on sentiment analysis datasets of varying granularity (binary, five-class, and 27-class emotion). Our findings reveal that reasoning effectiveness is strongly task-dependent, challenging prevailing assumptions: (1) Reasoning shows task-complexity dependence--binary classification degrades up to -19.9 F1 percentage points (pp), while 27-class emotion recognition gains up to +16.0pp; (2) Distilled reasoning variants underperform base models by 3-18 pp on simpler tasks, though few-shot prompting enables partial recovery; (3) Few-shot learning improves over zero-shot in most cases regardless of model type, with gains varying by architecture and task complexity; (4) Pareto frontier analysis shows base models dominate efficiency-performanc...