[2603.02635] SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety
About this article
Abstract page for arXiv paper 2603.02635: SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety
Computer Science > Machine Learning arXiv:2603.02635 (cs) [Submitted on 3 Mar 2026] Title:SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety Authors:Zixuan Xu, Tiancheng He, Huahui Yi, Kun Wang, Xi Chen, Gongli Xi, Qiankun Li, Kang Li, Yang Liu, Zhigang Zeng View a PDF of the paper titled SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety, by Zixuan Xu and 9 other authors View PDF HTML (experimental) Abstract:Vision-language models remain susceptible to multimodal jailbreaks and over-refusal because safety hinges on both visual evidence and user intent, while many alignment pipelines supervise only the final response. To address this, we present SaFeR-ToolKit, which formalizes safety decision-making as a checkable protocol. Concretely, a planner specifies a persona, a Perception $\to$ Reasoning $\to$ Decision tool set, and a constrained transition graph, while a responder outputs a typed key-value tool trace before the final answer. To make the protocol reliably followed in practice, we train a single policy with a three-stage curriculum (SFT $\to$ DPO $\to$ GRPO), where GRPO directly supervises tool usage beyond answer-level feedback. Our contributions are two-fold: I. Dataset. The first tool-based safety reasoning dataset, comprising 31,654 examples (SFT 6k, DPO 18.6k, GRPO 6k) plus 1k held-out evaluation. II. Experiments. On Qwen2.5-VL, SaFeR-ToolKit significantly improves Safety/Helpfulness/Reasoning Rigor...