[2603.25412] Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
About this article
Abstract page for arXiv paper 2603.25412: Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
Computer Science > Artificial Intelligence arXiv:2603.25412 (cs) [Submitted on 26 Mar 2026] Title:Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models Authors:Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li, Ruixuan Huang, Zhenlan Ji, Pingchuan Ma, Shuai Wang View a PDF of the paper titled Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models, by Xunguang Wang and 7 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) increasingly rely on explicit chain-of-thought (CoT) reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work on LLM safety focuses on content safety--detecting harmful, biased, or factually incorrect outputs -- and treats the reasoning chain as an opaque intermediate artifact. We identify reasoning safety as an orthogonal and equally critical security dimension: the requirement that a model's reasoning trajectory be logically consistent, computationally efficient, and resistant to adversarial manipulation. We make three contributions. First, we formally define reasoning safety and introduce a nine-category taxonomy of unsafe reasoning behaviors, covering input parsing errors, reasoning execution errors, and process management errors. Second, we conduct a large-scale prevalence study annotating 4111 reasoning chains from both natural reasoning benchmarks and four adversa...