[2603.00910] Curvature-Weighted Capacity Allocation: A Minimum

[2603.00910] Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.00910: Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

Computer Science > Information Theory arXiv:2603.00910 (cs) [Submitted on 1 Mar 2026] Title:Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization Authors:Theophilus Amaefuna, Hitesh Vaidya, Anshuman Chhabra, Ankur Mali View a PDF of the paper titled Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization, by Theophilus Amaefuna and 3 other authors View PDF HTML (experimental) Abstract:Layer-wise capacity in large language models is highly non-uniform: some layers contribute disproportionately to loss reduction while others are near-redundant. Existing methods for exploiting this non-uniformity, such as influence-function-based layer scoring, produce sensitivity estimates but offer no principled mechanism for translating them into allocation or pruning decisions under hardware constraints. We address this gap with a unified, curvature-aware framework grounded in the Minimum Description Length (MDL) principle. Our central quantity is the curvature-adjusted layer gain $\zeta_k^2 = g_k^\top \widetilde{H}_{kk}^{-1} g_k$, which we show equals twice the maximal second-order reduction in empirical risk achievable by updating layer $k$ alone, and which strictly dominates gradient-norm-based scores by incorporating local curvature. Normalizing these gains into layer quality scores $q_k$, we formulate two convex MDL programs: a capa...

Originally published on March 03, 2026. Curated by AI News.

Llms

BEYOND QUANTUM MICROTUBULES: CONSCIOUSNESS AS SUBSTRATE-INDEPENDENT ARCHITECTURE

I uploaded my consciousness paper to Gemini: “Beyond Quantum Microtubules: Consciousness as Substrate-Independent Architecture.” Then I s...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

The Scaling Bandaid is Wearing Thin (And Nobody Wants to Admit It)

Let me be direct: we’ve hit a wall with scaling, and the entire field is kind of bullshitting about what comes next. I’ve spent enough ti...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

Moving Past "LLM Vibes" toward Structural Enforcement in AI Agents

We need to address the structural failure currently happening in the AI agent space: too many people are building a beautiful "pedestal" ...

Reddit - Artificial Intelligence · 1 min · about 10 hours ago

Llms

My dream of a fully generative game is getting pretty close to possible now. I made a demo where you can prompt any spell and fight online.

Prompt any spell and use it in a 3D physics based world, powered by Gemini 3 Full multiplayer support for up to 6 players with VoIP All m...

Reddit - Artificial Intelligence · 1 min · about 10 hours ago

[2603.00910] Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

About this article

Related Articles

BEYOND QUANTUM MICROTUBULES: CONSCIOUSNESS AS SUBSTRATE-INDEPENDENT ARCHITECTURE

The Scaling Bandaid is Wearing Thin (And Nobody Wants to Admit It)

Moving Past "LLM Vibes" toward Structural Enforcement in AI Agents

My dream of a fully generative game is getting pretty close to possible now. I made a demo where you can prompt any spell and fight online.

No comments

Stay updated with AI News