[2602.13215] When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching

[2602.13215] When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching

arXiv - AI 3 min read Article

Summary

The paper presents AMOR, an entropy-based metacognitive gate that enhances attention switching in state space models, improving efficiency and retrieval accuracy in AI tasks.

Why It Matters

This research addresses limitations in traditional transformer models by introducing a hybrid architecture that optimizes attention allocation based on uncertainty, potentially advancing AI efficiency and effectiveness in complex tasks.

Key Takeaways

  • AMOR utilizes prediction entropy to determine when to engage attention, enhancing computational efficiency.
  • The model outperforms both SSM-only and transformer-only approaches in retrieval tasks.
  • AMOR achieves perfect retrieval accuracy while using attention on only 22% of positions.
  • The approach offers interpretable adaptive computation, linking routing decisions to information theory.
  • This research could influence future designs of AI architectures by integrating cognitive theories.

Computer Science > Artificial Intelligence arXiv:2602.13215 (cs) [Submitted on 22 Jan 2026] Title:When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching Authors:Haoran Zheng View a PDF of the paper titled When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching, by Haoran Zheng View PDF HTML (experimental) Abstract:Transformers allocate uniform computation to every position, regardless of difficulty. State Space Models (SSMs) offer efficient alternatives but struggle with precise information retrieval over a long horizon. Inspired by dual-process theories of cognition (Kahneman, 2011), we propose AMOR (Adaptive Metacognitive Output Router), a hybrid architecture that dynamically engages sparse attention only when an SSM backbone is "uncertain"--as measured by prediction entropy. Compared to standard transformers, AMOR gains efficiency by projecting keys and values from SSM hidden states (Ghost KV), reusing the SSM's O(n) computation rather than requiring O(n^2) attention at every layer. On small-scale synthetic retrieval tasks, AMOR outperforms both SSM-only and transformer-only baselines, achieving perfect retrieval accuracy while engaging attention on only 22% of positions. We validate that prediction entropy reliably signals retrieval need, with a gap of 1.09 nats (nearly half the entropy range) between retrieval and local positions. Additionally, our approach provides interp...

Related Articles

As Meta Flounders, It Reportedly Plans to Open Source Its New AI Models
Machine Learning

As Meta Flounders, It Reportedly Plans to Open Source Its New AI Models

AI Tools & Products · 5 min ·
Google quietly launched an AI dictation app that works offline
Machine Learning

Google quietly launched an AI dictation app that works offline

TechCrunch - AI · 4 min ·
Llms

Why do the various LLM disappoint me in reading requests?

Serious question here. I have tried various LLM over the past year to help me choose fictional novels to read based on a decent amount of...

Reddit - Artificial Intelligence · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime