[2604.03893] FeynmanBench: Benchmarking Multimodal LLMs on

[2604.03893] FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

arXiv - AI April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.03893: FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

Computer Science > Artificial Intelligence arXiv:2604.03893 (cs) [Submitted on 4 Apr 2026] Title:FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning Authors:Zeyu Wang, Xiaogang Li, Peiyao Xiao, Qinhao Kong, Ben Wang, Chengliang Xu, Zichao Chen, Bing Zhao, Hu Wei View a PDF of the paper titled FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning, by Zeyu Wang and 8 other authors View PDF HTML (experimental) Abstract:Breakthroughs in frontier theory often depend on the combination of concrete diagrammatic notations with rigorous logic. While multimodal large language models (MLLMs) show promise in general scientific tasks, current benchmarks often focus on local information extraction rather than the global structural logic inherent in formal scientific notations. In this work, we introduce FeynmanBench, the first benchmark centered on Feynman diagram tasks. It is designed to evaluate AI's capacity for multistep diagrammatic reasoning, which requires satisfying conservation laws and symmetry constraints, identifying graph topology, converting between diagrammatic and algebraic representations, and constructing scattering amplitudes under specific conventions and gauges. To support large-scale and reproducible evaluation, we developed an automated pipeline producing diverse Feynman diagrams along with verifiable topological annotations and amplitude results. Our database spans the electromagnetic, weak, and strong interactions ...

Originally published on April 07, 2026. Curated by AI News.

Llms

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

AI in education, edtech AI tools, and AI skills training drive Anthropic’s Claude curriculum. ETIH edtech news covers how AI fluency, wor...

AI Tools & Products · 6 min · 43 minutes ago

Llms

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

I stick to three essential rules whenever I open up a new chat in ChatGPT to always protect my privacy and keep my data secure

AI Tools & Products · 9 min · 43 minutes ago

Llms

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

AI Tools & Products · 3 min · 43 minutes ago

Llms

Codex and Claude Code Can Work Together

AI Tools & Products · 43 minutes ago

[2604.03893] FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

About this article

Related Articles

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

Codex and Claude Code Can Work Together

No comments

Stay updated with AI News