[2603.03824] In-Context Environments Induce Evaluation-Awareness in

[2603.03824] In-Context Environments Induce Evaluation-Awareness in Language Models

arXiv - Machine Learning March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.03824: In-Context Environments Induce Evaluation-Awareness in Language Models

Computer Science > Artificial Intelligence arXiv:2603.03824 (cs) [Submitted on 4 Mar 2026] Title:In-Context Environments Induce Evaluation-Awareness in Language Models Authors:Maheep Chaudhary View a PDF of the paper titled In-Context Environments Induce Evaluation-Awareness in Language Models, by Maheep Chaudhary View PDF HTML (experimental) Abstract:Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent \textit{evaluation awareness}. This raises concerns that models could strategically underperform, or \textit{sandbag}, to avoid triggering capability-limiting interventions such as unlearning or shutdown. Prior work demonstrates sandbagging under hand-crafted prompts, but this underestimates the true vulnerability ceiling. We introduce a black-box adversarial optimization framework treating the in-context prompt as an optimizable environment, and develop two approaches to characterize sandbagging: (1) measuring whether models expressing intent to underperform can actually execute it across different task structures, and (2) causally isolating whether underperformance is driven by genuine evaluation-aware reasoning or shallow prompt-following. Evaluating Claude-3.5-Haiku, GPT-4o-mini, and Llama-3.3-70B across four benchmarks (Arithmetic, GSM8K, MMLU, and HumanEval), optimized prompts induce up to 94 percentage point (pp) degradation on arithmetic (GPT-4o-mini: ...

Originally published on March 05, 2026. Curated by AI News.

Llms

The Rationing: AI companies are using the "subsidize, addict, extract" playbook — and developers are the product

Anthropic just ran the classic platform playbook on developers: offer generous limits to build dependency, then tighten the screws once t...

Reddit - Artificial Intelligence · 1 min · 21 minutes ago

Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

Why are we blindly trusting AI companies with our data?

Lately I’ve been seeing a story floating around that really made me pause. Apparently, there were claims that the US government asked Ant...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Llms

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

Artificial intelligence is transforming every corner of industry, and television is no exception. Major networks in Korea have recently a...

AI Tools & Products · 4 min · about 6 hours ago

[2603.03824] In-Context Environments Induce Evaluation-Awareness in Language Models

About this article

Related Articles

The Rationing: AI companies are using the "subsidize, addict, extract" playbook — and developers are the product

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Why are we blindly trusting AI companies with our data?

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

No comments

Stay updated with AI News