[2602.16173] Learning Personalized Agents from Human Feedback

[2602.16173] Learning Personalized Agents from Human Feedback

arXiv - Machine Learning 4 min read Article

Summary

The paper presents a framework, Personalized Agents from Human Feedback (PAHF), which enables AI agents to adapt to individual user preferences through continual learning from live interactions.

Why It Matters

As AI agents become increasingly integrated into daily life, their ability to adapt to the unique and evolving preferences of users is crucial. This research addresses the limitations of existing models that rely on static datasets, offering a solution that enhances user experience and personalization.

Key Takeaways

  • PAHF allows AI agents to learn from live interactions, improving personalization.
  • The framework includes a three-step loop for preference clarification, action grounding, and feedback integration.
  • Empirical results show PAHF outperforms traditional models in personalization accuracy and adaptability.

Computer Science > Artificial Intelligence arXiv:2602.16173 (cs) [Submitted on 18 Feb 2026] Title:Learning Personalized Agents from Human Feedback Authors:Kaiqu Liang, Julia Kruk, Shengyi Qian, Xianjun Yang, Shengjie Bi, Yuanshun Yao, Shaoliang Nie, Mingyang Zhang, Lijuan Liu, Jaime Fernández Fisac, Shuyan Zhou, Saghar Hosseini View a PDF of the paper titled Learning Personalized Agents from Human Feedback, by Kaiqu Liang and 11 other authors View PDF HTML (experimental) Abstract:Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding user profiles in external memory. However, these approaches struggle with new users and with preferences that change over time. We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization in which agents learn online from live interaction using explicit per-user memory. PAHF operationalizes a three-step loop: (1) seeking pre-action clarification to resolve ambiguity, (2) grounding actions in preferences retrieved from memory, and (3) integrating post-action feedback to update memory when preferences drift. To evaluate this capability, we develop a four-phase protocol and two benchmarks in embodied manipulation and online shopping. These benchmarks quantify an agent's ability to learn initial preferences from s...

Related Articles

Machine Learning

[HIRING] Machine Learning Evaluation Specialist | Remote | $50/hr

​ We are onboarding domain experts with strong machine learning knowledge to design advanced evaluation tasks for AI systems. About the R...

Reddit - ML Jobs · 1 min ·
Machine Learning

Japan is adopting robotics and physical AI, with a model where startups innovate and corporations provide scale

Physical AI is emerging as one of the next major industrial battlegrounds, with Japan’s push driven more by necessity than anything else....

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

mining hardware doing AI training - is the output actually useful

there's this network that launched recently routing crypto mining hardware toward AI training workloads. miners seem happy with the econo...

Reddit - Artificial Intelligence · 1 min ·
AI is changing how small online sellers decide what to make | MIT Technology Review
Machine Learning

AI is changing how small online sellers decide what to make | MIT Technology Review

Entrepreneurs based in the US are using tools like Alibaba’s Accio to compress weeks of product research and supplier hunting into a sing...

MIT Technology Review · 8 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime