[2602.04755] When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
About this article
Abstract page for arXiv paper 2602.04755: When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
Computer Science > Computation and Language arXiv:2602.04755 (cs) [Submitted on 4 Feb 2026 (v1), last revised 4 Mar 2026 (this version, v2)] Title:When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond? Authors:Xinyu Zhou, Chang Jin, Carsten Eickhoff, Zhijiang Guo, Seyed Ali Bahrainian View a PDF of the paper titled When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?, by Xinyu Zhou and 4 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) rarely admit uncertainty, often producing fluent but misleading answers, rather than abstaining (i.e., refusing to answer). This weakness is even evident in temporal question answering, where models frequently ignore time-sensitive evidence and conflate facts across different time-periods. In this paper, we present the first empirical study of training LLMs with an abstention ability while reasoning about temporal QA. Existing approaches such as calibration might be unreliable in capturing uncertainty in complex reasoning. We instead frame abstention as a teachable skill and introduce a pipeline that couples Chain-of-Thought (CoT) supervision with Reinforcement Learning (RL) guided by abstention-aware rewards. Our goal is to systematically analyze how different information types and training techniques affect temporal reasoning with abstention behavior in LLMs. Through extensive experiments studying various methods, we find that RL yields strong empirical gains on ...