[2603.03330] Certainty robustness: Evaluating LLM stability under self-challenging prompts
About this article
Abstract page for arXiv paper 2603.03330: Certainty robustness: Evaluating LLM stability under self-challenging prompts
Computer Science > Computation and Language arXiv:2603.03330 (cs) [Submitted on 10 Feb 2026] Title:Certainty robustness: Evaluating LLM stability under self-challenging prompts Authors:Mohammadreza Saadat, Steve Nemzer View a PDF of the paper titled Certainty robustness: Evaluating LLM stability under self-challenging prompts, by Mohammadreza Saadat and Steve Nemzer View PDF HTML (experimental) Abstract:Large language models (LLMs) often present answers with high apparent confidence despite lacking an explicit mechanism for reasoning about certainty or truth. While existing benchmarks primarily evaluate single-turn accuracy, truthfulness or confidence calibration, they do not capture how models behave when their responses are challenged in interactive settings. We introduce the Certainty Robustness Benchmark, a two-turn evaluation framework that measures how LLMs balance stability and adaptability under self-challenging prompts such as uncertainty ("Are you sure?") and explicit contradiction ("You are wrong!"), alongside numeric confidence elicitation. Using 200 reasoning and mathematics questions from LiveBench, we evaluate four state-of-the-art LLMs and distinguish between justified self-corrections and unjustified answer changes. Our results reveal substantial differences in interactive reliability that are not explained by baseline accuracy alone: some models abandon correct answers under conversational pressure, while others demonstrate strong resistance to challenge ...