Llms Machine Learning Ai Infrastructure

[2509.25684] LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts

arXiv - AI February 25, 2026 4 min read Article

Summary

The paper presents LD-MoLE, a novel Learnable Dynamic Routing mechanism for Mixture of LoRA Experts, enhancing token-dependent expert allocation in large language models.

Why It Matters

As large language models become increasingly complex, efficient fine-tuning methods are crucial for adapting them to specific tasks. LD-MoLE offers a significant advancement by enabling dynamic routing, which can optimize performance and resource utilization in AI applications.

Key Takeaways

LD-MoLE replaces conventional TopK routing with a learnable, differentiable routing mechanism.
The method allows for adaptive expert allocation based on token and layer characteristics.
Extensive experiments show LD-MoLE outperforms existing state-of-the-art methods across various benchmarks.
The approach includes a sparsity control objective to regularize the number of activated experts.
This innovation can lead to more efficient and effective applications of large language models.

Computer Science > Computation and Language arXiv:2509.25684 (cs) [Submitted on 30 Sep 2025 (v1), last revised 23 Feb 2026 (this version, v2)] Title:LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts Authors:Yuan Zhuang, Yi Shen, Yuexin Bian, Qing Su, Shihao Ji, Yuanyuan Shi, Fei Miao View a PDF of the paper titled LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts, by Yuan Zhuang and 6 other authors View PDF HTML (experimental) Abstract:Recent studies have shown that combining parameter-efficient fine-tuning (PEFT) with mixture-of-experts (MoE) is an effective strategy for adapting large language models (LLMs) to the downstream tasks. However, most existing approaches rely on conventional TopK routing, which requires careful hyperparameter tuning and assigns a fixed number of experts to each token. In this work, we propose LD-MoLE, a Learnable Dynamic routing mechanism for Mixture of LoRA Experts that enables adaptive, token-dependent, and layer-wise expert allocation. Our method replaces the non-differentiable TopK selection with a differentiable routing function and a closed-form solution. Moreover, our design allows the model to adaptively determine the number of experts to activate for each token at different layers. In addition, we introduce an analytical sparsity control objective to regularize the number of activated experts. Extensive experiments on the Qwen3-1.7B and Llama-3.2-3B models show that LD-MoLE achieves the highest average scores...

Read Original Article

Llms

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

Anthropic's AuditBench - 56 Llama 3.3 70B models with planted hidden behaviors - their best agent detects the behaviros 10-13% of the tim...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min · about 4 hours ago

Llms

I have been coding for 11 years and I caught myself completely unable to debug a problem without AI assistance last month. That scared me more than anything I have seen in this industry.

I want to be honest about something that happened to me because I think it is more common than people admit. Last month I hit a bug in a ...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min · about 11 hours ago

[2509.25684] LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts

Summary

Why It Matters

Key Takeaways

Related Articles

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

I have been coding for 11 years and I caught myself completely unable to debug a problem without AI assistance last month. That scared me more than anything I have seen in this industry.

OpenClaw security checklist: practical safeguards for AI agents

No comments

Stay updated with AI News