[2603.30036] Aligned, Orthogonal or In-conflict: When can we safely

[2603.30036] Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

arXiv - AI April 01, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.30036: Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

Computer Science > Machine Learning arXiv:2603.30036 (cs) [Submitted on 31 Mar 2026] Title:Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought? Authors:Max Kaufmann, David Lindner, Roland S. Zimmermann, and Rohin Shah View a PDF of the paper titled Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?, by Max Kaufmann and David Lindner and Roland S. Zimmermann and and Rohin Shah View PDF HTML (experimental) Abstract:Chain-of-Thought (CoT) monitoring, in which automated systems monitor the CoT of an LLM, is a promising approach for effectively overseeing AI systems. However, the extent to which a model's CoT helps us oversee the model - the monitorability of the CoT - can be affected by training, for instance by the model learning to hide important features of its reasoning. We propose and empirically validate a conceptual framework for predicting when and why this occurs. We model LLM post-training as an RL environment where the reward decomposes into two terms: one term depending on final outputs and another term depending on the CoT. Our framework allows us to classify these two terms as "aligned", "orthogonal", or "in-conflict" before training. We predict that training with in-conflict terms will reduce monitorability, orthogonal terms will not affect it, and aligned terms will improve it. To validate our framework, we use it to classify a set of RL environments, train LLMs within those environments, and evaluate ...

Originally published on April 01, 2026. Curated by AI News.

Llms

AI helped me build a custom PC and 4 apps in 6 months with zero coding experience

Mid-October, early morning at work. I was hunting for a podcast to throw on while I worked and stumbled into something about what AI coul...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

I thought of something while cooking up a simple RL AI. Please Validate it. [R]

So, I was trying to build a simple AI when I thought of, 'How could I give an AI some emotions? ' This led to one thing after another, an...

Reddit - Machine Learning · 1 min · about 5 hours ago

Llms

Open-source list of GenAI-related incidents

I am sharing this open-source list of cases where the ethics of GenAI use were put in the spotlight, in the hopes of sparking discussion ...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

I built a repo for implementing and training LLM architectures from scratch in minimal PyTorch — contributions welcome! [P]

Hey everyone, I've been working on a repo where I implement large language model architectures using the simplest PyTorch code possible. ...

Reddit - Machine Learning · 1 min · about 7 hours ago

[2603.30036] Aligned, Orthogonal or In-conflict: When can we safely optimize Chain-of-Thought?

About this article

Related Articles

AI helped me build a custom PC and 4 apps in 6 months with zero coding experience

I thought of something while cooking up a simple RL AI. Please Validate it. [R]

Open-source list of GenAI-related incidents

I built a repo for implementing and training LLM architectures from scratch in minimal PyTorch — contributions welcome! [P]

No comments

Stay updated with AI News