[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces

[2602.21231] ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces

arXiv - Machine Learning 4 min read Article

Summary

The paper presents ACAR, a framework for adaptive complexity routing in multi-model ensembles, demonstrating improved task routing accuracy and documenting practical limitations.

Why It Matters

ACAR addresses the challenges of multi-model orchestration in AI by providing a measurable framework that enhances decision-making accuracy while highlighting potential pitfalls in model agreement and attribution methods. This is crucial for advancing AI reliability and transparency.

Key Takeaways

  • ACAR improves task routing accuracy to 55.6%, surpassing baseline models.
  • The framework avoids full ensembling in over half of the tasks, optimizing resource use.
  • Negative results reveal limitations in retrieval augmentation and model agreement failures.
  • Attribution estimates based on proxy signals are weak, necessitating better methods.
  • ACAR sets falsifiable baselines for future research in multi-model systems.

Computer Science > Machine Learning arXiv:2602.21231 (cs) [Submitted on 6 Feb 2026] Title:ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces Authors:Ramchand Kumaresan View a PDF of the paper titled ACAR: Adaptive Complexity Routing for Multi-Model Ensembles with Auditable Decision Traces, by Ramchand Kumaresan View PDF HTML (experimental) Abstract:We present ACAR (Adaptive Complexity and Attribution Routing), a measurement framework for studying multi-model orchestration under auditable conditions. ACAR uses self-consistency variance (sigma) computed from N=3 probe samples to route tasks across single-model, two-model, and three-model execution modes. The system is implemented on top of TEAMLLM, a deterministic execution substrate with immutable artifacts and complete decision traces. We evaluate ACAR on 1,510 tasks spanning four benchmarks: MathArena, Reasoning Gym, LiveCodeBench, and SuperGPQA, using Claude Sonnet 4, GPT-4o, and Gemini 2.0 Flash, producing more than 7,550 auditable runs. Results show that sigma-based routing achieves 55.6 percent accuracy, exceeding the two-model baseline of 54.4 percent while avoiding full ensembling on 54.2 percent of tasks. The routing mechanism is model-agnostic and requires no learned components. We also document negative results. First, retrieval augmentation reduced accuracy by 3.4 percentage points, as median retrieval similarity was only 0.167, demonstrating that experience injection with...

Related Articles

Llms

AI Has Broken the Internet

So the web has been breaking a lot lately. Vercel is down. GitHub is down. Claude is down. Cloudflare is down. AWS is down. Everything is...

Reddit - Artificial Intelligence · 1 min ·
Llms

LLM agents can trigger real actions now. But what actually stops them from executing?

We ran into a simple but important issue while building agents with tool calling: the model can propose actions but nothing actually enfo...

Reddit - Artificial Intelligence · 1 min ·
Llms

Are LLMs a Dead End? (Investors Just Bet $1 Billion on “Yes”)

| AI Reality Check | Cal Newport Chapters 0:00 What is Yan LeCun Up To? 14:55 How is it possible that LeCun could be right about LLM’s be...

Reddit - Artificial Intelligence · 1 min ·
Mercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project | TechCrunch
Llms

Mercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project | TechCrunch

The AI recruiting startup confirmed a security incident after an extortion hacking crew took credit for stealing data from the company's ...

TechCrunch - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime