[2603.01006] AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching
About this article
Abstract page for arXiv paper 2603.01006: AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching
Computer Science > Sound arXiv:2603.01006 (cs) [Submitted on 1 Mar 2026] Title:AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching Authors:Pengfei Zhang, Tianxin Xie, Minghao Yang, Li Liu View a PDF of the paper titled AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching, by Pengfei Zhang and 3 other authors View PDF HTML (experimental) Abstract:REPresentation Alignment (REPA) improves the training of generative flow models by aligning intermediate hidden states with pretrained teacher features, but its effectiveness in token-conditioned audio Flow Matching critically depends on the choice of supervised layers, which is typically made heuristically based on the depth. In this work, we introduce Attribution-Guided REPresentation Alignment (AG-REPA), a novel causal layer selection strategy for representation alignment in audio Flow Matching. Firstly, we find that layers that best store semantic/acoustic information (high teacher-space similarity) are not necessarily the layers that contribute most to the velocity field that drives generation, and we call it Store-Contribute Dissociation (SCD). To turn this insight into an actionable training guidance, we propose a forward-only gate ablation (FoG-A) that quantifies each layer's causal contribution via the induced change in the predicted velocity field, enabling sparse layer selection and adaptive weighting for alignment. Across unified speech and general-audio tra...