[2603.18908] Secure Linear Alignment of Large Language Models
About this article
Abstract page for arXiv paper 2603.18908: Secure Linear Alignment of Large Language Models
Computer Science > Artificial Intelligence arXiv:2603.18908 (cs) [Submitted on 19 Mar 2026 (v1), last revised 21 Mar 2026 (this version, v2)] Title:Secure Linear Alignment of Large Language Models Authors:Matt Gorbett, Suman Jana View a PDF of the paper titled Secure Linear Alignment of Large Language Models, by Matt Gorbett and 1 other authors View PDF HTML (experimental) Abstract:Language models increasingly appear to learn similar representations, despite differences in training objectives, architectures, and data modalities. This emerging compatibility between independently trained models introduces new opportunities for cross-model alignment to downstream objectives. Moreover, it unlocks new potential application domains, such as settings where security, privacy, or competitive constraints prohibit direct data or model sharing. In this work, we propose a privacy-preserving framework that exploits representational convergence to enable cross-silo inference between independent language models. The framework learns an affine transformation over a shared public dataset and applies homomorphic encryption to protect client queries during inference. By encrypting only the linear alignment and classification operations, the method achieves sub-second inference latency while maintaining strong security guarantees. We support this framework with an empirical investigation into representational convergence, in which we learn linear transformations between the final hidden states...