[2603.07475] A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs
About this article
Abstract page for arXiv paper 2603.07475: A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs
Computer Science > Computation and Language arXiv:2603.07475 (cs) [Submitted on 8 Mar 2026 (v1), last revised 27 Apr 2026 (this version, v2)] Title:A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs Authors:Raghavv Goel, Risheek Garrepalli, Sudhanshu Agrawal, Chris Lott, Mingu Lee, Fatih Porikli View a PDF of the paper titled A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs, by Raghavv Goel and 5 other authors View PDF HTML (experimental) Abstract:Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although recent dLLMs match AR performance, whether diffusion objectives fundamentally reshape internal representations remains unclear. We perform the first layer- and token-wise representational analysis comparing native dLLMs (LLaDA), native AR models (Qwen2.5), and AR-initialized dLLMs (Dream-7B), using cosine similarity across layers and tokens alongside static inference-time layer-skipping as an analytical probe of redundancy. We find that diffusion objectives produce more global representations with substantial early-layer redundancy and reduced recency bias, while AR objectives yield tightly coupled, locally structured representations. AR-initialized dLLMs retain AR-like dynamics despite diffusion training, revealing persistent initialization bias. Leveraging this...