[2602.17003] Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History
Summary
The paper introduces Persona2Web, a benchmark for evaluating personalized web agents that utilize user history to resolve ambiguous queries, enhancing contextual reasoning capabilities.
Why It Matters
As web agents become integral to user interactions, improving their ability to personalize responses based on user history is crucial. This benchmark addresses existing limitations in agent performance, paving the way for more intuitive and effective AI interactions.
Key Takeaways
- Persona2Web is the first benchmark for personalized web agents.
- It emphasizes resolving ambiguity in user queries through historical context.
- The framework includes user histories, ambiguous queries, and a reasoning-aware evaluation.
- Experiments reveal challenges in agent behavior and personalization.
- The research supports reproducibility with publicly available codes and datasets.
Computer Science > Computation and Language arXiv:2602.17003 (cs) [Submitted on 19 Feb 2026] Title:Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History Authors:Serin Kim, Sangam Lee, Dongha Lee View a PDF of the paper titled Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History, by Serin Kim and 2 other authors View PDF HTML (experimental) Abstract:Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user preferences and contexts. To address this challenge, we present Persona2Web, the first benchmark for evaluating personalized web agents on the real open web, built upon the clarify-to-personalize principle, which requires agents to resolve ambiguity based on user history rather than relying on explicit instructions. Persona2Web consists of: (1) user histories that reveal preferences implicitly over long time spans, (2) ambiguous queries that require agents to infer implicit user preferences, and (3) a reasoning-aware evaluation framework that enables fine-grained assessment of personalization. We conduct extensive experiments across various agent architectures, backbone models, history access schemes, and queries with varying ambiguity levels, revealing key challenges in personalized web agent behavior. For reprod...