[2604.02051] Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation
About this article
Abstract page for arXiv paper 2604.02051: Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation
Computer Science > Machine Learning arXiv:2604.02051 (cs) [Submitted on 2 Apr 2026] Title:Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation Authors:Jaber Jaber, Osama Jaber View a PDF of the paper titled Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation, by Jaber Jaber and 1 other authors View PDF HTML (experimental) Abstract:Recursive transformers reuse a shared weight block across multiple depth steps, trading parameters for compute. A core limitation: every step applies the same transformation, preventing the model from composing distinct operations across depth. We present Ouroboros, a system that attaches a compact Controller hypernetwork to a recursive transformer block. The Controller observes the current hidden state, produces a per-step diagonal modulation vector, and applies it to frozen SVD-initialized LoRA bases, making each recurrence step input-dependent. We combine this with gated recurrence (bias-initialized to 88% retention) and per-step LayerNorm for stable deep iteration. On Qwen2.5-3B split into a Prelude/Recurrent/Coda architecture (17 of 36 layers retained), Ouroboros reduces training loss by 43.4% over the unmodified 17-layer baseline, recovering 51.3% of the performance gap caused by layer removal. The full system adds only 9.2M trainable parameters (Controller, gate, and per-step norms) yet outperforms equivalently-sized static per-step LoRA by 1...