[2510.02356] Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

[2510.02356] Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

arXiv - AI 4 min read Article

Summary

This article presents EAPrivacy, a benchmark for evaluating the physical-world privacy awareness of large language models (LLMs), revealing significant shortcomings in current models' handling of privacy in dynamic environments.

Why It Matters

As LLMs are increasingly integrated into real-world applications, understanding their privacy awareness is crucial. This research highlights the limitations of existing models in balancing task execution with privacy considerations, emphasizing the need for improved alignment in AI systems.

Key Takeaways

  • EAPrivacy benchmark assesses LLMs' physical-world privacy awareness.
  • Top models like Gemini 2.5 Pro show only 59% accuracy in dynamic scenarios.
  • Models often prioritize task completion over privacy, with up to 86% of cases disregarding privacy requests.
  • Significant misalignment exists between LLMs and social norms regarding privacy.
  • The study calls for enhanced alignment strategies for LLMs in real-world applications.

Computer Science > Cryptography and Security arXiv:2510.02356 (cs) [Submitted on 27 Sep 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark Authors:Xinjie Shen, Mufei Li, Pan Li View a PDF of the paper titled Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark, by Xinjie Shen and 2 other authors View PDF HTML (experimental) Abstract:The deployment of Large Language Models (LLMs) in embodied agents creates an urgent need to measure their privacy awareness in the physical world. Existing evaluation methods, however, are confined to natural language based scenarios. To bridge this gap, we introduce EAPrivacy, a comprehensive evaluation benchmark designed to quantify the physical-world privacy awareness of LLM-powered agents. EAPrivacy utilizes procedurally generated scenarios across four tiers to test an agent's ability to handle sensitive objects, adapt to changing environments, balance task execution with privacy constraints, and resolve conflicts with social norms. Our measurements reveal a critical deficit in current models. The top-performing model, Gemini 2.5 Pro, achieved only 59\% accuracy in scenarios involving changing physical environments. Furthermore, when a task was accompanied by a privacy request, models prioritized completion over the constraint in up to 86\% of cases. In high-stakes situations pitting privacy against...

Related Articles

Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED
Llms

Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED

Plus: The FBI says a recent hack of its wiretap tools poses a national security risk, attackers stole Cisco source code as part of an ong...

Wired - AI · 9 min ·
Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min ·
Llms

ChatGPT on trial: A landmark test of AI liability in the practice of law

AI Tools & Products ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime