Llms Machine Learning Ai Safety

[2602.21262] Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models

arXiv - Machine Learning February 26, 2026 4 min read Research

Summary

This paper investigates the interplay between persuasion and vigilance in Large Language Models (LLMs), revealing that these capacities are dissociable and critical for AI safety.

Why It Matters

As LLMs become integral in high-stakes decision-making, understanding their ability to persuade and discern information is vital. This research highlights the risks of misinformation and the need for monitoring LLMs' performance in both persuasion and vigilance to ensure safe AI deployment.

Key Takeaways

Persuasion and vigilance in LLMs are independent capacities.
High performance in tasks does not guarantee resistance to deception.
LLMs adjust token usage based on the nature of advice (benevolent vs. malicious).
Understanding these dynamics is crucial for future AI safety measures.
This study is the first to explore the relationship between persuasion, vigilance, and task performance in LLMs.

Computer Science > Computation and Language arXiv:2602.21262 (cs) [Submitted on 24 Feb 2026] Title:Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models Authors:Sasha Robinson, Kerem Oktar, Katherine M. Collins, Ilia Sucholutsky, Kelsey R. Allen View a PDF of the paper titled Under the Influence: Quantifying Persuasion and Vigilance in Large Language Models, by Sasha Robinson and 4 other authors View PDF HTML (experimental) Abstract:With increasing integration of Large Language Models (LLMs) into areas of high-stakes human decision-making, it is important to understand the risks they introduce as advisors. To be useful advisors, LLMs must sift through large amounts of content, written with both benevolent and malicious intent, and then use this information to convince a user to take a specific action. This involves two social capacities: vigilance (the ability to determine which information to use, and which to discard) and persuasion (synthesizing the available evidence to make a convincing argument). While existing work has investigated these capacities in isolation, there has been little prior investigation of how these capacities may be linked. Here, we use a simple multi-turn puzzle-solving game, Sokoban, to study LLMs' abilities to persuade and be rationally vigilant towards other LLM agents. We find that puzzle-solving performance, persuasive capability, and vigilance are dissociable capacities in LLMs. Performing well on the game does n...

Read Original Article