[2602.18464] How Well Can LLM Agents Simulate End-User Security and Privacy Attitudes and Behaviors?

[2602.18464] How Well Can LLM Agents Simulate End-User Security and Privacy Attitudes and Behaviors?

arXiv - AI 4 min read Article

Summary

This paper investigates the effectiveness of large language model (LLM) agents in simulating user attitudes and behaviors towards security and privacy threats, using a new benchmark called SP-ABCBench.

Why It Matters

Understanding how well LLMs can mimic human security and privacy attitudes is crucial for developing safer AI systems. This research highlights the limitations of current models and emphasizes the need for improvement, which is vital for enhancing user trust and product security.

Key Takeaways

  • LLM agents currently show limited alignment with human attitudes towards security and privacy, scoring between 50 and 64 on average.
  • Some configurations of LLMs can achieve high alignment scores, particularly when applying bounded rationality in decision-making.
  • The SP-ABCBench benchmark is introduced to facilitate reproducible evaluations of LLM performance in simulating user behavior.
  • Newer models do not consistently outperform older ones, indicating a need for better training and prompting strategies.
  • Improving LLM simulations can help forecast security and privacy risks in products before deployment.

Computer Science > Computers and Society arXiv:2602.18464 (cs) [Submitted on 6 Feb 2026] Title:How Well Can LLM Agents Simulate End-User Security and Privacy Attitudes and Behaviors? Authors:Yuxuan Li, Leyang Li, Hao-Ping (Hank)Lee, Sauvik Das View a PDF of the paper titled How Well Can LLM Agents Simulate End-User Security and Privacy Attitudes and Behaviors?, by Yuxuan Li and 3 other authors View PDF HTML (experimental) Abstract:A growing body of research assumes that large language model (LLM) agents can serve as proxies for how people form attitudes toward and behave in response to security and privacy (S&P) threats. If correct, these simulations could offer a scalable way to forecast S&P risks in products prior to deployment. We interrogate this assumption using SP-ABCBench, a new benchmark of 30 tests derived from validated S&P human-subject studies, which measures alignment between simulations and human-subjects studies on a 0-100 ascending scale, where higher scores indicate better alignment across three dimensions: Attitude, Behavior, and Coherence. Evaluating twelve LLMs, four persona construction strategies, and two prompting methods, we found that there remains substantial room for improvement: all models score between 50 and 64 on average. Newer, bigger, and smarter models do not reliably do better and sometimes do worse. Some simulation configurations, however, do yield high alignment: e.g., with scores above 95 for some behavior tests when agents are prompte...

Related Articles

Llms

I am seeing Claude everywhere

Every single Instagram reel or TikTok I scroll i see people mentioning Claude and glazing it like it’s some kind of master tool that’s be...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED
Llms

Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED

Plus: The FBI says a recent hack of its wiretap tools poses a national security risk, attackers stole Cisco source code as part of an ong...

Wired - AI · 9 min ·
Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime