[2603.01239] Self-Anchoring Calibration Drift in Large Language

[2603.01239] Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence

arXiv - AI March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.01239: Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence

Computer Science > Computation and Language arXiv:2603.01239 (cs) [Submitted on 1 Mar 2026] Title:Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence Authors:Harshavardhan View a PDF of the paper titled Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence, by Harshavardhan View PDF HTML (experimental) Abstract:We introduce Self-Anchoring Calibration Drift (SACD), a hypothesized tendency for large language models (LLMs) to show systematic changes in expressed confidence when building iteratively on their own prior outputs across multi-turn conversations. We report an empirical study comparing three frontier models -- Claude Sonnet 4.6, Gemini 3.1 Pro, and GPT-5.2 -- across 150 questions spanning factual, technical, and open-ended domains, using three conditions: single-turn baseline (A), multi-turn self-anchoring (B), and independent repetition control (C). Results reveal a complex, model-heterogeneous pattern that partially diverges from pre-registered hypotheses. Claude Sonnet 4.6 exhibited significant decreasing confidence under self-anchoring (mean CDS = -0.032, t(14) = -2.43, p = .029, d = -0.627), while also showing significant calibration error drift (F(4,56) = 22.77, p < .001, eta^2 = .791). GPT-5.2 showed the opposite pattern in open-ended domains (mean CDS = +0.026) with significant ECE escalation by Turn 5. Gemini 3.1 Pro showed no significant ...

Originally published on March 03, 2026. Curated by AI News.

Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min · 17 minutes ago

Llms

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

submitted by /u/ThereWas [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

What features do you actually want in an AI chatbot that nobody has built yet?

Hey everyone 👋 I'm building a new AI chat app and before I build anything I want to hear from real users first. Current AI tools like Cha...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

So, what exactly is going on with the Claude usage limits?

I'm extremely new to AI and am building a local agent for fun. I purchased a Claude Pro account because it helped me a lot in the past wh...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

[2603.01239] Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence

About this article

Related Articles

[P] Remote sensing foundation models made easy to use.

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

What features do you actually want in an AI chatbot that nobody has built yet?

So, what exactly is going on with the Claude usage limits?

No comments

Stay updated with AI News