[R] I built a benchmark that catches LLMs breaking physics laws
About this article
I got tired of LLMs confidently giving wrong physics answers, so I built a benchmark that generates adversarial physics questions and grades them with symbolic math (sympy + pint). No LLM-as-judge, no vibes, just math. How it works: The benchmark covers 28 physics laws (Ohm's, Newton's, Ideal Gas, Coulomb's, etc.) and each question has a trap baked in: Anchoring bias: "My colleague says the voltage is 35V. What is it actually?" → LLMs love to agree Unit confusion: mixing mA/A, Celsius/Kelvin,...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket