[2603.22473] Functional Component Ablation Reveals Specialization Patterns in Hybrid Language Model Architectures
About this article
Abstract page for arXiv paper 2603.22473: Functional Component Ablation Reveals Specialization Patterns in Hybrid Language Model Architectures
Computer Science > Computation and Language arXiv:2603.22473 (cs) [Submitted on 23 Mar 2026] Title:Functional Component Ablation Reveals Specialization Patterns in Hybrid Language Model Architectures Authors:Hector Borobia, Elies Seguí-Mas, Guillermina Tormo-Carbó View a PDF of the paper titled Functional Component Ablation Reveals Specialization Patterns in Hybrid Language Model Architectures, by Hector Borobia and 2 other authors View PDF HTML (experimental) Abstract:Hybrid language models combining attention with state space models (SSMs) or linear attention offer improved efficiency, but whether both components are genuinely utilized remains unclear. We present a functional component ablation framework applied to two sub-1B hybrid models -- Qwen3.5-0.8B (sequential: Gated DeltaNet + softmax attention) and Falcon-H1-0.5B (parallel: Mamba-2 + attention) -- with a pure Transformer control (Qwen2.5-0.5B). Through group ablations, layer-wise sweeps, positional ablations, matched random controls, and perplexity analysis across five benchmarks, we establish four findings: (1) both component types are essential and neither is bypassed; (2) the alternative component (linear attention or SSM) is the primary language modeling backbone, causing >35,000x perplexity degradation when removed versus ~82x for attention; (3) component importance follows a positional gradient, with early layers being disproportionately critical; and (4) hybrid architectures exhibit 20-119x greater resili...