[2603.29112] GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification
About this article
Abstract page for arXiv paper 2603.29112: GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification
Computer Science > Artificial Intelligence arXiv:2603.29112 (cs) [Submitted on 31 Mar 2026] Title:GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification Authors:Iordanis Fostiropoulos, Muhammad Rafay Azhar, Abdalaziz Sawwan, Boyu Fang, Yuchen Liu, Jiayi Liu, Hanchao Yu, Qi Guo, Jianyu Wang, Fei Liu, Xiangjun Fan View a PDF of the paper titled GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification, by Iordanis Fostiropoulos and 10 other authors View PDF HTML (experimental) Abstract:We introduce GISTBench, a benchmark for evaluating Large Language Models' (LLMs) ability to understand users from their interaction histories in recommendation systems. Unlike traditional RecSys benchmarks that focus on item prediction accuracy, our benchmark evaluates how well LLMs can extract and verify user interests from engagement data. We propose two novel metric families: Interest Groundedness (IG), decomposed into precision and recall components to separately penalize hallucinated interest categories and reward coverage, and Interest Specificity (IS), which assesses the distinctiveness of verified LLM-predicted user profiles. We release a synthetic dataset constructed on real user interactions on a global short-form video platform. Our dataset contains both implicit and explicit engagement signals and rich textual descriptions. We validate our dataset fidelity against user surveys, and evaluate eight open-weight LLMs spanning 7...