[2603.00047] What Is the Geometry of the Alignment Tax?

arXiv - Machine Learning March 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.00047: What Is the Geometry of the Alignment Tax?

Economics > Econometrics arXiv:2603.00047 (econ) [Submitted on 9 Feb 2026] Title:What Is the Geometry of the Alignment Tax? Authors:Robin Young View a PDF of the paper titled What Is the Geometry of the Alignment Tax?, by Robin Young View PDF HTML (experimental) Abstract:The alignment tax is widely discussed but has not been formally characterized. We provide a geometric theory of the alignment tax in representation space. Under linear representation assumptions, we define the alignment tax rate as the squared projection of the safety direction onto the capability subspace and derive the Pareto frontier governing safety-capability tradeoffs, parameterized by a single quantity of the principal angle between the safety and capability subspaces. We prove this frontier is tight (achieved by perturbation) and show it has a recursive structure. safety-safety tradeoffs under capability constraints are governed by the same equation, with the angle replaced by the partial correlation between safety objectives given capability directions. We derive a scaling law decomposing the alignment tax into an irreducible component (determined by data structure) and a packing residual that vanishes as $O(m'/d)$ with model dimension $d$, and establish conditions under which capability preservation mediates or resolves conflicts between safety objectives. We provide an account consistent with prior empirical findings and generates falsifiable predictions about per-task alignment tax rates and th...

Originally published on March 03, 2026. Curated by AI News.

Ai Safety

The state of AI safety in four fake graphs

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Machine Learning

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Abstract page for arXiv paper 2603.14267: DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and ...

arXiv - AI · 4 min · about 15 hours ago

Llms

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

Abstract page for arXiv paper 2601.22440: AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Value...

arXiv - AI · 4 min · about 15 hours ago

Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min · about 15 hours ago

[2603.00047] What Is the Geometry of the Alignment Tax?

About this article

Related Articles

The state of AI safety in four fake graphs

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

[2601.22440] AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

No comments

Stay updated with AI News