[2505.21605] SoSBench: Benchmarking Safety Alignment on Six Scientific Domains
About this article
Abstract page for arXiv paper 2505.21605: SoSBench: Benchmarking Safety Alignment on Six Scientific Domains
Computer Science > Machine Learning arXiv:2505.21605 (cs) [Submitted on 27 May 2025 (v1), last revised 5 Apr 2026 (this version, v3)] Title:SoSBench: Benchmarking Safety Alignment on Six Scientific Domains Authors:Fengqing Jiang, Fengbo Ma, Zhangchen Xu, Yuetai Li, Zixin Rao, Bhaskar Ramasubramanian, Luyao Niu, Bo Li, Xianyan Chen, Zhen Xiang, Radha Poovendran View a PDF of the paper titled SoSBench: Benchmarking Safety Alignment on Six Scientific Domains, by Fengqing Jiang and 10 other authors View PDF Abstract:Large language models (LLMs) exhibit advancing capabilities in complex tasks, such as reasoning and graduate-level question answering, yet their resilience against misuse, particularly involving scientifically sophisticated risks, remains underexplored. Existing safety benchmarks typically focus either on instructions requiring minimal knowledge comprehension (e.g., ``tell me how to build a bomb") or utilize prompts that are relatively low-risk (e.g., multiple-choice or classification tasks about hazardous content). Consequently, they fail to adequately assess model safety when handling knowledge-intensive, hazardous scenarios. To address this critical gap, we introduce SoSBench, a regulation-grounded, hazard-focused benchmark encompassing six high-risk scientific domains: chemistry, biology, medicine, pharmacology, physics, and psychology. The benchmark comprises 3,000 prompts derived from real-world regulations and laws, systematically expanded via an LLM-assiste...