[2604.06264] ToxReason: A Benchmark for Mechanistic Chemical Toxicity

[2604.06264] ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway

arXiv - AI April 09, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.06264: ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway

Quantitative Biology > Quantitative Methods arXiv:2604.06264 (q-bio) [Submitted on 7 Apr 2026] Title:ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway Authors:Jueon Park, Wonjune Jang, Chanhwi Kim, Yein Park, Jaewoo Kang View a PDF of the paper titled ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway, by Jueon Park and 4 other authors View PDF HTML (experimental) Abstract:Recent advances in large language models (LLMs) have enabled molecular reasoning for property prediction. However, toxicity arises from complex biological mechanisms beyond chemical structure, necessitating mechanistic reasoning for reliable prediction. Despite its importance, current benchmarks fail to systematically evaluate this capability. LLMs can generate fluent but biologically unfaithful explanations, making it difficult to assess whether predicted toxicities are grounded invalid mechanisms. To bridge this gap, we introduce ToxReason, a benchmark grounded in the Adverse Outcome Pathway (AOP) that evaluates organ-level toxicity reasoning across multiple organs. ToxReason integrates experimental drug-target interaction evidence with toxicity labels, requiring models to infer both toxic outcomes and their underlying mechanisms from Molecular Initiating Event (MIE) to Adverse Outcome (AO). Using ToxReason, we evaluate toxicity prediction performance and reasoning quality across diverse LLMs. We find that strong p...

Originally published on April 09, 2026. Curated by AI News.

Llms

We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”

What is the “personality” of an LLM? What actually differentiates models psychometrically? Since LLMs entered public use, researchers hav...

Reddit - Artificial Intelligence · 1 min · 39 minutes ago

Llms

How to Disable Google's Gemini in Chrome | WIRED

Chrome users were caught off guard by a 4-GB Google AI model baked into Chrome, sparking privacy concerns. The good news: You can easily ...

Wired - AI · 6 min · 39 minutes ago

Llms

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

The company is expanding its efforts to protect ChatGPT users in cases where conversations may turn to self-harm.

TechCrunch - AI · 5 min · about 1 hour ago

Llms

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Thanks to Musk v. Altman, the public is getting a concrete look at details of Sam Altman’s ouster from OpenAI, much of it centered on for...

The Verge - AI · 11 min · about 3 hours ago

[2604.06264] ToxReason: A Benchmark for Mechanistic Chemical Toxicity Reasoning via Adverse Outcome Pathway

About this article

Related Articles

We gave 45 psychological questionnaires to 50 LLMs. What we found was not “personality.”

How to Disable Google's Gemini in Chrome | WIRED

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

No comments

Stay updated with AI News