AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems
About this article
A Blog post by ServiceNow-AI on Hugging Face
Back to Articles AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems Enterprise Article Published December 23, 2025 Upvote 47 +41 Jaykumar Kasundra JayKasundraSNOW Follow ServiceNow-AI Large Language Models (LLMs) have rapidly evolved from text-only assistants into complex agentic systems capable of performing multi-step reasoning, calling external tools, retrieving memory, and executing code. With this evolution comes an increasingly sophisticated threat landscape: not only traditional content safety risks, but also multi-turn jailbreaks, prompt injections, memory hijacking, and tool manipulation. In this work, we introduce AprielGuard, an 8B parameter safety–security safeguard model designed to detect: 16 categories of safety risks, spanning toxicity, hate, sexual content, misinformation, self-harm, illegal activities, and more. Wide range of adversarial attacks, including prompt injection, jailbreaks, chain-of-thought corruption, context hijacking, memory poisoning, and multi-agent exploit sequences. Safety violations and adversarial attacks in agentic workflows, including tool calls and model reasoning traces. AprielGuard is available in both reasoning and non-reasoning modes, enabling explainable classification when needed and low-latency classification for production pipelines. Model: https://huggingface.co/ServiceNow-AI/AprielGuard Technical Paper: https://arxiv.org/abs/2512.20293 Table of Contents Motivation AprielGuard Overview Tax...