[2603.30036] Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

[2603.30036] Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.30036: Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

Computer Science > Machine Learning arXiv:2603.30036 (cs) [Submitted on 31 Mar 2026] Title:Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought? Authors:Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah View a PDF of the paper titled Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?, by Max Kaufmann and David Lindner and Roland S. Zimmermann and and Rohin Shah View PDF HTML (experimental) Abstract:Chain-of-Thought (CoT) monitoring, in which automated systems monitor the CoT of an LLM, is a promising approach for effectively overseeing AI systems. However, the extent to which a model's CoT helps us oversee the model - the monitorability of the CoT - can be affected by training, for instance by the model learning to hide important features of its reasoning. We propose and empirically validate a conceptual framework for predicting when and why this occurs. We model LLM post-training as an RL environment where the reward decomposes into two terms: one term depending on final outputs and another term depending on the CoT. Our framework allows us to classify these two terms as "aligned", "orthogonal", or "in-conflict" before training. We predict that training with in-conflict terms will reduce monitorability, orthogonal terms will not affect it, and aligned terms will improve it. To validate our framework, we use it to classify a set of RL environments, train LLMs within those environments, and evaluate ...

Originally published on April 01, 2026. Curated by AI News.

Related Articles

Llms

AI helped me build a custom PC and 4 apps in 6 months with zero coding experience

Mid-October, early morning at work. I was hunting for a podcast to throw on while I worked and stumbled into something about what AI coul...

Reddit - Artificial Intelligence · 1 min ·
Llms

I thought of something while cooking up a simple RL AI. Please Validate it. [R]

So, I was trying to build a simple AI when I thought of, 'How could I give an AI some emotions? ' This led to one thing after another, an...

Reddit - Machine Learning · 1 min ·
Llms

Open-source list of GenAI-related incidents

I am sharing this open-source list of cases where the ethics of GenAI use were put in the spotlight, in the hopes of sparking discussion ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I built a repo for implementing and training LLM architectures from scratch in minimal PyTorch — contributions welcome! [P]

Hey everyone, I've been working on a repo where I implement large language model architectures using the simplest PyTorch code possible. ...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime