[2602.17003] Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

[2602.17003] Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

arXiv - AI 3 min read Article

Summary

The paper introduces Persona2Web, a benchmark for evaluating personalized web agents that utilize user history to resolve ambiguous queries, enhancing contextual reasoning capabilities.

Why It Matters

As web agents become integral to user interactions, improving their ability to personalize responses based on user history is crucial. This benchmark addresses existing limitations in agent performance, paving the way for more intuitive and effective AI interactions.

Key Takeaways

  • Persona2Web is the first benchmark for personalized web agents.
  • It emphasizes resolving ambiguity in user queries through historical context.
  • The framework includes user histories, ambiguous queries, and a reasoning-aware evaluation.
  • Experiments reveal challenges in agent behavior and personalization.
  • The research supports reproducibility with publicly available codes and datasets.

Computer Science > Computation and Language arXiv:2602.17003 (cs) [Submitted on 19 Feb 2026] Title:Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History Authors:Serin Kim, Sangam Lee, Dongha Lee View a PDF of the paper titled Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History, by Serin Kim and 2 other authors View PDF HTML (experimental) Abstract:Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user preferences and contexts. To address this challenge, we present Persona2Web, the first benchmark for evaluating personalized web agents on the real open web, built upon the clarify-to-personalize principle, which requires agents to resolve ambiguity based on user history rather than relying on explicit instructions. Persona2Web consists of: (1) user histories that reveal preferences implicitly over long time spans, (2) ambiguous queries that require agents to infer implicit user preferences, and (3) a reasoning-aware evaluation framework that enables fine-grained assessment of personalization. We conduct extensive experiments across various agent architectures, backbone models, history access schemes, and queries with varying ambiguity levels, revealing key challenges in personalized web agent behavior. For reprod...

Related Articles

Llms

I am seeing Claude everywhere

Every single Instagram reel or TikTok I scroll i see people mentioning Claude and glazing it like it’s some kind of master tool that’s be...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED
Llms

Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED

Plus: The FBI says a recent hack of its wiretap tools poses a national security risk, attackers stole Cisco source code as part of an ong...

Wired - AI · 9 min ·
Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime