[2604.08362] Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

[2604.08362] Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2604.08362: Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

Computer Science > Computation and Language arXiv:2604.08362 (cs) [Submitted on 9 Apr 2026] Title:Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces Authors:Jiawei Chen, Ruoxi Xu, Boxi Cao, Ruotong Pan, Yunfei Zhang, Yifei Hu, Yong Du, Tingting Gao, Yaojie Lu, Yingfei Sun, Xianpei Han, Le Sun, Xiangyu Wu, Hongyu Lin View a PDF of the paper titled Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces, by Jiawei Chen and 13 other authors View PDF HTML (experimental) Abstract:The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulator. However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, failing to capture the holistic nature of authentic human behavior. To bridge this gap, we introduce OmniBehavior, the first user simulation benchmark constructed entirely from real-world data, integrating long-horizon, cross-scenario, and heterogeneous behavioral patterns into a unified framework. Based on this benchmark, we first provide empirical evidence that previous datasets with isolated scenarios suffer from tunnel vision, whereas real-world decision-making relies on long-term, cross-scenario causal chains. Extensive evaluations of state-of-the-art LLMs reveal that current models struggle to acc...

Originally published on April 13, 2026. Curated by AI News.

Related Articles

Llms

We built something ChatGPT doesn't do — AI that delivers results, not answers

Most AI gives you text. We built cards. Here's what I mean. When you ask LookMood Agent to find you a job, you don't get advice on where ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I am not an "anti" like this guy, but still an interesting video of person interacting with chat 4o

(Posting Here because removed by Chatgpt Complaints moderators because the model here is 4o, and refuse to believe there were any safety ...

Reddit - Artificial Intelligence · 1 min ·
Llms

We built a way for two people's AI context to talk to each other (without sharing their conversations)

We've been thinking about how we use AI in our relationships. Big part of it is about other people. Talking about them, figuring out what...

Reddit - Artificial Intelligence · 1 min ·
No flattery please, Claude: I’m British | Brief letters
Llms

No flattery please, Claude: I’m British | Brief letters

AI Tools & Products · 2 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime