[2603.03322] Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery
About this article
Abstract page for arXiv paper 2603.03322: Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery
Computer Science > Computation and Language arXiv:2603.03322 (cs) [Submitted on 10 Feb 2026] Title:Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery Authors:Chaoqun Yang, Xinyu Lin, Shulin Li, Wenjie Wang, Ruihan Guo, Fuli Feng, Tat-Seng Chua View a PDF of the paper titled Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery, by Chaoqun Yang and 6 other authors View PDF HTML (experimental) Abstract:Recent advancements in Large Language Model (LLM) agents have demonstrated remarkable potential in automatic knowledge discovery. However, rigorously evaluating an AI's capacity for knowledge discovery remains a critical challenge. Existing benchmarks predominantly rely on static datasets, leading to inevitable data contamination where models have likely seen the evaluation knowledge during training. Furthermore, the rapid release cycles of modern LLMs render static benchmarks quickly outdated, failing to assess the ability to discover truly new knowledge. To address these limitations, we propose DBench-Bio, a dynamic and fully automated benchmark designed to evaluate AI's biological knowledge discovery ability. DBench-Bio employs a three-stage pipeline: (1) data acquisition of rigorous, authoritative paper abstracts; (2) QA extraction utilizing LLMs to synthesize scientific hypothesis questions and corresponding discovery answers; and (3) QA filter to ensure quality based on r...