[2602.14143] ROAST: Rollout-based On-distribution Activation Steering Technique

[2602.14143] ROAST: Rollout-based On-distribution Activation Steering Technique

arXiv - Machine Learning 3 min read Article

Summary

The ROAST technique enhances the control of large language models by utilizing on-distribution rollouts for more effective activation steering, improving performance across various tasks.

Why It Matters

This research addresses the limitations of existing activation steering methods that rely on off-distribution supervision, which can lead to inconsistent results. By introducing ROAST, the authors provide a more robust framework for optimizing model performance, which is crucial for advancing applications in machine learning and AI.

Key Takeaways

  • ROAST improves activation steering by using on-distribution rollouts.
  • The technique avoids brittle interventions associated with discrete masking.
  • Grouped normalization balances contributions across samples for better performance.
  • Empirical results show significant performance improvements on various tasks.
  • CSS helps maintain activation energy, enhancing model robustness.

Computer Science > Machine Learning arXiv:2602.14143 (cs) [Submitted on 15 Feb 2026] Title:ROAST: Rollout-based On-distribution Activation Steering Technique Authors:Xuanbo Su, Hao Luo, Yingfang Zhang, Lijun Zhang View a PDF of the paper titled ROAST: Rollout-based On-distribution Activation Steering Technique, by Xuanbo Su and 3 other authors View PDF HTML (experimental) Abstract:Activation steering provides parameter-efficient control over large language models (LLMs) at inference time, but many methods rely on off-distribution supervision and discrete masking, leading to brittle interventions. We propose ROAST (Rollout-based On-distribution Activation Steering Technique), which estimates steering directions from the model's own on-distribution rollouts via ROC and avoids hard sparsification via Continuous Soft Scaling (CSS) and Grouped Mean Normalization. Our empirical analysis reveals that while activation magnitude correlates moderately with directional consistency, the variance in magnitude is significant and often disproportionate to semantic quality. This suggests that high-magnitude activations risk dominating the global steering direction if not properly normalized. To address this, ROAST employs grouped normalization to balance contributions across samples, ensuring a more robust estimation of the consensus steering direction. Across models (0.6B to 32B), ROAST consistently improves performance on diverse tasks (e.g., +9.7% on GSM8K for Qwen3-0.6B and +12.1% on ...

Related Articles

Llms

[D] Tested model routing on financial AI datasets — good savings and curious what benchmarks others use.

Ran a benchmark evaluating whether prompt complexity-based routing delivers meaningful savings. Used public HuggingFace datasets. Here's ...

Reddit - Machine Learning · 1 min ·
Llms

[D] AI research on small language models

i'm doing research on some trending fields in AI, currently working on small language models and would love to meet people who are workin...

Reddit - Machine Learning · 1 min ·
Llms

One of The Worst AI's I've Ever Seen

I'm using Gemini just for they gave us a student-free-pro pack. It can't see the images I sent, most of the time it just rewrites the mes...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone 👋 I've set up a self-hosted API gateway using New-API to manage and distribute Claude Opus 4.6 access across multiple users....

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime