[2508.01055] FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models

[2508.01055] FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models

arXiv - AI 4 min read Article

Summary

FGBench introduces a dataset for molecular property reasoning at the functional group level, enhancing the capabilities of large language models (LLMs) in chemistry tasks.

Why It Matters

This research addresses a gap in existing datasets by focusing on functional group-level information, which can improve the interpretability and performance of LLMs in molecular design and drug discovery. By highlighting the limitations of current models, it sets the stage for future advancements in AI applications within chemistry.

Key Takeaways

  • FGBench comprises 625K molecular property reasoning problems with functional group data.
  • The dataset includes regression and classification tasks across 245 functional groups.
  • Current LLMs show limitations in FG-level property reasoning, indicating a need for improvement.
  • The methodology can serve as a framework for future dataset generation in molecular property reasoning.
  • FGBench aims to enhance the understanding of structure-property relationships in molecules.

Computer Science > Machine Learning arXiv:2508.01055 (cs) [Submitted on 1 Aug 2025 (v1), last revised 16 Feb 2026 (this version, v4)] Title:FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models Authors:Xuan Liu, Siru Ouyang, Xianrui Zhong, Jiawei Han, Huimin Zhao View a PDF of the paper titled FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models, by Xuan Liu and 4 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have gained significant attention in chemistry. However, most existing datasets center on molecular-level property prediction and overlook the role of fine-grained functional group (FG) information. Incorporating FG-level data can provide valuable prior knowledge that links molecular structures with textual descriptions, which can be used to build more interpretable, structure-aware LLMs for reasoning on molecule-related tasks. Moreover, LLMs can learn from such fine-grained information to uncover hidden relationships between specific functional groups and molecular properties, thereby advancing molecular design and drug discovery. Here, we introduce FGBench, a dataset comprising 625K molecular property reasoning problems with functional group information. Functional groups are precisely annotated and localized within the molecule, which ensures the dataset's interoperability thereby facilitating further mult...

Related Articles

Google’s Gemini AI can answer your questions with 3D models and simulations
Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

Google's latest upgrade for Gemini will allow the chatbot to generate interactive 3D models and simulations in response to your questions...

The Verge - AI · 4 min ·
Moody’s Integrates AI Agents With Anthropic’s Claude
Llms

Moody’s Integrates AI Agents With Anthropic’s Claude

AI Tools & Products · 4 min ·
AI on the couch: Anthropic gives Claude 20 hours of psychiatry
Llms

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

AI Tools & Products · 6 min ·
These AI Glasses Switch Between ChatGPT and Gemini. Why Don't More Wearables Do This?
Llms

These AI Glasses Switch Between ChatGPT and Gemini. Why Don't More Wearables Do This?

AI Tools & Products · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime