[2603.00910] Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

[2603.00910] Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.00910: Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

Computer Science > Information Theory arXiv:2603.00910 (cs) [Submitted on 1 Mar 2026] Title:Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization Authors:Theophilus Amaefuna, Hitesh Vaidya, Anshuman Chhabra, Ankur Mali View a PDF of the paper titled Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization, by Theophilus Amaefuna and 3 other authors View PDF HTML (experimental) Abstract:Layer-wise capacity in large language models is highly non-uniform: some layers contribute disproportionately to loss reduction while others are near-redundant. Existing methods for exploiting this non-uniformity, such as influence-function-based layer scoring, produce sensitivity estimates but offer no principled mechanism for translating them into allocation or pruning decisions under hardware constraints. We address this gap with a unified, curvature-aware framework grounded in the Minimum Description Length (MDL) principle. Our central quantity is the curvature-adjusted layer gain $\zeta_k^2 = g_k^\top \widetilde{H}_{kk}^{-1} g_k$, which we show equals twice the maximal second-order reduction in empirical risk achievable by updating layer $k$ alone, and which strictly dominates gradient-norm-based scores by incorporating local curvature. Normalizing these gains into layer quality scores $q_k$, we formulate two convex MDL programs: a capa...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

Llms

BEYOND QUANTUM MICROTUBULES: CONSCIOUSNESS AS SUBSTRATE-INDEPENDENT ARCHITECTURE

I uploaded my consciousness paper to Gemini: “Beyond Quantum Microtubules: Consciousness as Substrate-Independent Architecture.” Then I s...

Reddit - Artificial Intelligence · 1 min ·
Llms

The Scaling Bandaid is Wearing Thin (And Nobody Wants to Admit It)

Let me be direct: we’ve hit a wall with scaling, and the entire field is kind of bullshitting about what comes next. I’ve spent enough ti...

Reddit - Artificial Intelligence · 1 min ·
Llms

Moving Past "LLM Vibes" toward Structural Enforcement in AI Agents

We need to address the structural failure currently happening in the AI agent space: too many people are building a beautiful "pedestal" ...

Reddit - Artificial Intelligence · 1 min ·
Llms

My dream of a fully generative game is getting pretty close to possible now. I made a demo where you can prompt any spell and fight online.

Prompt any spell and use it in a 3D physics based world, powered by Gemini 3 Full multiplayer support for up to 6 players with VoIP All m...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime