[2602.16990] Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation
Summary
The paper introduces Conv-FinRe, a benchmark for evaluating financial recommendation systems that emphasizes utility-grounded decision-making over mere behavioral imitation.
Why It Matters
This research addresses the limitations of traditional recommendation benchmarks in finance, which often misinterpret user behavior as optimal decision-making. By focusing on long-term investment goals and risk preferences, Conv-FinRe provides a more accurate framework for assessing financial advisory models, which is crucial for improving AI-driven financial recommendations.
Key Takeaways
- Conv-FinRe offers a new benchmark for evaluating financial recommendations based on utility rather than just user behavior.
- The benchmark incorporates real market data and human decision-making processes to enhance model evaluation.
- Models that prioritize rational decision-making may not align with user choices, highlighting a critical tension in financial AI.
- The dataset and codebase are publicly available, promoting transparency and further research.
- Understanding user risk preferences is essential for developing effective financial advisory systems.
Computer Science > Artificial Intelligence arXiv:2602.16990 (cs) [Submitted on 19 Feb 2026] Title:Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation Authors:Yan Wang, Yi Han, Lingfei Qian, Yueru He, Xueqing Peng, Dongji Feng, Zhuohan Xie, Vincent Jim Zhang, Rosie Guo, Fengran Mo, Jimin Huang, Yankai Chen, Xue Liu, Jian-Yun Nie View a PDF of the paper titled Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation, by Yan Wang and 13 other authors View PDF HTML (experimental) Abstract:Most recommendation benchmarks evaluate how well a model imitates user behavior. In financial advisory, however, observed actions can be noisy or short-sighted under market volatility and may conflict with a user's long-term goals. Treating what users chose as the sole ground truth, therefore, conflates behavioral imitation with decision quality. We introduce Conv-FinRe, a conversational and longitudinal benchmark for stock recommendation that evaluates LLMs beyond behavior matching. Given an onboarding interview, step-wise market context, and advisory dialogues, models must generate rankings over a fixed investment horizon. Crucially, Conv-FinRe provides multi-view references that distinguish descriptive behavior from normative utility grounded in investor-specific risk preferences, enabling diagnosis of whether an LLM follows rational analysis, mimics user noise, or is driven by market momentum. W...