[2505.21396] Augmenting Research Ideation with Data: An Empirical Investigation in Social Science
About this article
Abstract page for arXiv paper 2505.21396: Augmenting Research Ideation with Data: An Empirical Investigation in Social Science
Computer Science > Computation and Language arXiv:2505.21396 (cs) [Submitted on 27 May 2025 (v1), last revised 28 Feb 2026 (this version, v2)] Title:Augmenting Research Ideation with Data: An Empirical Investigation in Social Science Authors:Xiao Liu, Xinyi Dong, Xinyang Gao, Yansong Feng, Xun Pang View a PDF of the paper titled Augmenting Research Ideation with Data: An Empirical Investigation in Social Science, by Xiao Liu and 4 other authors View PDF HTML (experimental) Abstract:Recent advancements in large language models (LLMs) demonstrate strong potential for generating novel research ideas, yet such ideas often struggle with feasibility and effectiveness. In this paper, we investigate whether augmenting LLMs with relevant data during the ideation process can improve idea quality. Our framework integrates data at two stages: (1) incorporating metadata during idea generation to guide models toward more feasible concepts, and (2) introducing an automated preliminary validation step during idea selection to assess the empirical plausibility of hypotheses within ideas. We evaluate our approach in the social science domain, with a specific focus on climate negotiation topics. Expert evaluation shows that metadata improves the feasibility of generated ideas by 20%, while automated validation improves the overall quality of selected ideas by 7%. Beyond assessing the quality of LLM-generated ideas, we conduct a human study to examine whether these ideas, augmented with relat...