Llms Machine Learning Ai Infrastructure Ai Agents Nlp

[2602.14143] ROAST: Rollout-based On-distribution Activation Steering Technique

arXiv - Machine Learning February 17, 2026 3 min read Article

Summary

The ROAST technique enhances the control of large language models by utilizing on-distribution rollouts for more effective activation steering, improving performance across various tasks.

Why It Matters

This research addresses the limitations of existing activation steering methods that rely on off-distribution supervision, which can lead to inconsistent results. By introducing ROAST, the authors provide a more robust framework for optimizing model performance, which is crucial for advancing applications in machine learning and AI.

Key Takeaways

ROAST improves activation steering by using on-distribution rollouts.
The technique avoids brittle interventions associated with discrete masking.
Grouped normalization balances contributions across samples for better performance.
Empirical results show significant performance improvements on various tasks.
CSS helps maintain activation energy, enhancing model robustness.

Computer Science > Machine Learning arXiv:2602.14143 (cs) [Submitted on 15 Feb 2026] Title:ROAST: Rollout-based On-distribution Activation Steering Technique Authors:Xuanbo Su, Hao Luo, Yingfang Zhang, Lijun Zhang View a PDF of the paper titled ROAST: Rollout-based On-distribution Activation Steering Technique, by Xuanbo Su and 3 other authors View PDF HTML (experimental) Abstract:Activation steering provides parameter-efficient control over large language models (LLMs) at inference time, but many methods rely on off-distribution supervision and discrete masking, leading to brittle interventions. We propose ROAST (Rollout-based On-distribution Activation Steering Technique), which estimates steering directions from the model's own on-distribution rollouts via ROC and avoids hard sparsification via Continuous Soft Scaling (CSS) and Grouped Mean Normalization. Our empirical analysis reveals that while activation magnitude correlates moderately with directional consistency, the variance in magnitude is significant and often disproportionate to semantic quality. This suggests that high-magnitude activations risk dominating the global steering direction if not properly normalized. To address this, ROAST employs grouped normalization to balance contributions across samples, ensuring a more robust estimation of the consensus steering direction. Across models (0.6B to 32B), ROAST consistently improves performance on diverse tasks (e.g., +9.7% on GSM8K for Qwen3-0.6B and +12.1% on ...

Read Original Article

[2602.14143] ROAST: Rollout-based On-distribution Activation Steering Technique

Summary

Why It Matters

Key Takeaways

Related Articles

[D] Tested model routing on financial AI datasets — good savings and curious what benchmarks others use.

[D] AI research on small language models

One of The Worst AI's I've Ever Seen

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

No comments

Stay updated with AI News