[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.02219: NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Computer Science > Machine Learning arXiv:2603.02219 (cs) [Submitted on 10 Feb 2026] Title:NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels Authors:Junfeng Fang, Nachuan Chen, Houcheng Jiang, Dan Zhang, Fei Shen, Xiang Wang, Xiangnan He, Tat-Seng Chua View a PDF of the paper titled NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels, by Junfeng Fang and 7 other authors View PDF HTML (experimental) Abstract:Large language models are increasingly deployed in streaming scenarios, rendering conventional post-hoc safeguards ineffective as they fail to interdict unsafe content in real-time. While streaming safeguards based on token-level supervised training could address this, they necessitate expensive annotations and suffer from severe overfitting. In this work, we challenge the paradigm that streaming safety must rely on token-level supervised training. Instead, it is an inherent capability of well-trained post-hoc safeguards, as they already encode token-level risk signals in hidden representations. Hence, we introduce NExT-Guard, a training-free framework that achieves streaming safeguards by monitoring interpretable latent features from Sparse Autoencoders (SAEs). It uses pretrained SAEs from publicly available base LLMs, enabling flexible, low-cost deployment without token-level supervision. Experimental results show that NExT-Guard outperforms both post-hoc and streaming safeguards based on supervised training, with superior r...

Originally published on March 04, 2026. Curated by AI News.

Related Articles

Claude Mythos and misguided open-weight fearmongering
Llms

Claude Mythos and misguided open-weight fearmongering

AI Tools & Products · 9 min ·
Llms

Anthropic Agrees to Rent CoreWeave AI Capacity to Power Claude

AI Tools & Products · 1 min ·
CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%
Llms

CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%

AI Tools & Products · 3 min ·
Walmart’s AI Push Links Gemini App Experience With U.S. Manufacturing Shift
Llms

Walmart’s AI Push Links Gemini App Experience With U.S. Manufacturing Shift

AI Tools & Products · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime