[2503.05371] Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs

[2503.05371] Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2503.05371: Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs

Computer Science > Machine Learning arXiv:2503.05371 (cs) [Submitted on 7 Mar 2025 (v1), last revised 28 Mar 2026 (this version, v3)] Title:Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs Authors:Zara Siddique, Irtaza Khalid, Liam D. Turner, Luis Espinosa-Anke View a PDF of the paper titled Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs, by Zara Siddique and 3 other authors View PDF HTML (experimental) Abstract:We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We compute 8 steering vectors, each corresponding to a different social bias axis, such as age, gender, or race, on a training subset of the BBQ dataset and compare the effectiveness of these to 3 additional bias mitigation methods across 4 datasets. When optimized on the BBQ dataset, our individually tuned steering vectors achieve average improvements of 12.8% on BBQ, 8.3% on CLEAR-Bias, and 1% on StereoSet, and show improvements over prompting and Self-Debias in all cases, and improvements over fine-tuning in 12 out of 17 evaluations. In addition, steering vectors showed the lowest impact on MMLU scores of the four bias mitigation methods tested. The work presents the first systematic investigation of steering vectors for bias mitigation, and we demonstrate that they are a powerful and computationally efficient strategy for reducing bias in LLMs, with broade...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

Llms

Anyone here using local models mainly to keep LLM costs under control?

Been noticing that once you use LLMs for real dev work, the cost conversation gets messy fast. It is not just raw API spend. It is retrie...

Reddit - Artificial Intelligence · 1 min ·
Claude AI Goes Down for Thousands of Users Wednesday, Downdetector Reports
Llms

Claude AI Goes Down for Thousands of Users Wednesday, Downdetector Reports

Claude AI faces an outage today as over 7,000 users report issues. Stay informed about the situation here.

AI Tools & Products · 6 min ·
Llms

ChatGPT meets coffee: Starbucks launches AI ordering tool

Starbucks has launched an AI ordering tool that integrates with ChatGPT, aiming to improve the customer experience by streamlining the or...

AI Tools & Products · 1 min ·
NFL mock draft 2026: ChatGPT AI gives the worst predictions you'll ever see
Llms

NFL mock draft 2026: ChatGPT AI gives the worst predictions you'll ever see

USA TODAY Sports features a mock draft for the 2026 NFL Draft created by ChatGPT AI, which is noted for being the worst mock draft ever p...

AI Tools & Products · 9 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime