[2602.18479] AgentCAT: An LLM Agent for Extracting and Analyzing Catalytic Reaction Data from Chemical Engineering Literature
Summary
AgentCAT is a large language model designed to extract and analyze catalytic reaction data from chemical engineering literature, addressing data bottlenecks in the field.
Why It Matters
This research introduces a novel AI tool that enhances the accessibility and usability of complex catalytic reaction data, which is crucial for advancing chemical engineering research. By facilitating natural language interaction with data, AgentCAT aims to bridge gaps in existing methodologies and promote further exploration in the field.
Key Takeaways
- AgentCAT provides a schema-governed extraction pipeline for robust data handling.
- It features a dependency-aware knowledge graph linking various catalytic reaction components.
- The tool supports natural language queries for intuitive data exploration.
- An evaluation on 800 publications demonstrates its effectiveness.
- AgentCAT aims to attract more AI research attention to catalytic reaction data extraction.
Physics > Chemical Physics arXiv:2602.18479 (physics) [Submitted on 10 Feb 2026] Title:AgentCAT: An LLM Agent for Extracting and Analyzing Catalytic Reaction Data from Chemical Engineering Literature Authors:Wei Yang, Zihao Liu, Tao Tan, Xiao Hu, Hong Xie, Lulu Li Xin Li, Jianyu Han, Defu Lian, Mao Ye View a PDF of the paper titled AgentCAT: An LLM Agent for Extracting and Analyzing Catalytic Reaction Data from Chemical Engineering Literature, by Wei Yang and 8 other authors View PDF HTML (experimental) Abstract:This paper presents a large language model (LLM) agent named AgentCAT, which extracts and analyzes catalytic reaction data from chemical engineering papers, %and supports natural language based interactive analysis of the extracted data. AgentCAT serves as an alternative to overcome the long-standing data bottleneck in chemical engineering field, and its natural language based interactive data analysis functionality is friendly to the community. AgentCAT also presents a formal abstraction and challenge analysis of the catalytic reaction data extraction task in an artificial intelligence-friendly manner. This abstraction would help the artificial intelligence community understand this problem and in turn would attract more attention to address it. Technically, the complex catalytic process leads to complicated dependency structure in catalytic reaction data with respect to elementary reaction steps, molecular behaviors, measurement evidence, etc. This dependency str...