[2602.15028] Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization
Summary
The paper examines how increasing context length in large language models (LLMs) affects personalization quality and privacy risks, revealing a scaling gap in performance.
Why It Matters
As LLMs are increasingly used in sensitive applications, understanding the trade-offs between context length, personalization, and privacy is crucial. This research provides a benchmark for evaluating these aspects, which can guide future model development and deployment in privacy-critical scenarios.
Key Takeaways
- Longer context lengths in LLMs lead to decreased personalization quality and increased privacy risks.
- The study introduces PAPerBench, a benchmark for evaluating the impact of context length on LLMs.
- Theoretical analysis suggests that attention dilution is a limitation of current transformer models.
- Empirical findings indicate a general scaling gap in LLM performance as context length increases.
- The research supports reproducible evaluation and future studies on privacy and personalization in AI.
Computer Science > Machine Learning arXiv:2602.15028 (cs) [Submitted on 16 Feb 2026] Title:Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization Authors:Shangding Gu View a PDF of the paper titled Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization, by Shangding Gu View PDF Abstract:Large language models (LLMs) are increasingly deployed in privacy-critical and personalization-oriented scenarios, yet the role of context length in shaping privacy leakage and personalization effectiveness remains largely unexplored. We introduce a large-scale benchmark, PAPerBench, to systematically study how increasing context length influences both personalization quality and privacy protection in LLMs. The benchmark comprises approximately 29,000 instances with context lengths ranging from 1K to 256K tokens, yielding a total of 377K evaluation questions. It jointly evaluates personalization performance and privacy risks across diverse scenarios, enabling controlled analysis of long-context model behavior. Extensive evaluations across state-of-the-art LLMs reveal consistent performance degradation in both personalization and privacy as context length increases. We further provide a theoretical analysis of attention dilution under context scaling, explaining this behavior as an inherent limitation of soft attention in fixed-capacity Transformers. The empirical and theoretical findings together suggest a general s...