[2512.05525] Poodle: Seamlessly Scaling Down Large Language Models

[2512.05525] Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

arXiv - Machine Learning May 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2512.05525: Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

Computer Science > Databases arXiv:2512.05525 (cs) [Submitted on 5 Dec 2025 (v1), last revised 4 May 2026 (this version, v2)] Title:Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement Authors:Nils Strassenburg, Boris Glavic, Tilmann Rabl View a PDF of the paper titled Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement, by Nils Strassenburg and 2 other authors View PDF HTML (experimental) Abstract:Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training examples and can be utilized by users without expertise in model development. However, this comes at the cost of substantially higher resource and energy consumption compared to smaller models, which often achieve similar predictive performance for simple tasks. In this paper, we present our vision for just-in-time model replacement (JITR), where, upon identifying a recurring task in calls to an LLM, the model is replaced transparently with a cheaper alternative that performs well for this specific task. JITR retains the ease of use and low development effort of LLMs, while saving significant cost and energy. We discuss the main challenges in realizing our vision regarding the identification of recurring tasks and the creation of a custom model. Specifically, we argue that model search and transfer learning will play a cru...

Originally published on May 05, 2026. Curated by AI News.

Llms

[2602.07238] Is there "Secret Sauce'' in Large Language Model Development?

Abstract page for arXiv paper 2602.07238: Is there "Secret Sauce'' in Large Language Model Development?

arXiv - Machine Learning · 3 min · about 5 hours ago

Llms

[2602.01203] Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

Abstract page for arXiv paper 2602.01203: Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

arXiv - Machine Learning · 4 min · about 5 hours ago

Llms

[2601.01322] LinMU: Multimodal Understanding Made Linear

Abstract page for arXiv paper 2601.01322: LinMU: Multimodal Understanding Made Linear

arXiv - Machine Learning · 4 min · about 5 hours ago

Llms

[2511.21678] Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

Abstract page for arXiv paper 2511.21678: Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

arXiv - Machine Learning · 4 min · about 5 hours ago

[2512.05525] Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

About this article

Related Articles

[2602.07238] Is there "Secret Sauce'' in Large Language Model Development?

[2602.01203] Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

[2601.01322] LinMU: Multimodal Understanding Made Linear

[2511.21678] Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

No comments

Stay updated with AI News