[2602.20517] Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
Summary
The paper presents MIMIC, a framework that enhances human-AI coordination by using inner speech to guide behavior imitation in artificial agents, improving adaptability and diversity in responses.
Why It Matters
As AI systems increasingly interact with humans, the ability to mimic human-like behaviors and adapt to context is crucial. This research addresses limitations in current imitation learning methods, proposing a novel approach that leverages inner speech for better human-AI collaboration.
Key Takeaways
- MIMIC uses inner speech as a guide for behavior imitation in AI.
- The framework improves the diversity and fidelity of AI responses.
- It allows for nuanced behavioral steering without requiring additional training.
- Experiments show significant enhancements in robotic tasks and collaboration games.
- The code and pre-trained models are open-sourced for further research.
Computer Science > Artificial Intelligence arXiv:2602.20517 (cs) [Submitted on 24 Feb 2026] Title:Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination Authors:Rakshit Trivedi, Kartik Sharma, David C Parkes View a PDF of the paper titled Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination, by Rakshit Trivedi and 2 other authors View PDF HTML (experimental) Abstract:Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts. Imitation learning has emerged as one of the prominent approaches to build such agents by training them to mimic human-demonstrated behaviors. However, current methods struggle to capture the inherent diversity and non-Markovian nature of human behavior and lack the ability to steer behavior at inference time. Drawing inspiration from the theory of human cognitive processes, where inner speech guides action selection before execution, we propose MIMIC (Modeling Inner Motivations for Imitation and Control), a framework that uses language as an internal representation of behavioral intent. MIMIC employs the novel use of vision-language models as linguistic scaffolding to train a conditional variational autoencoder capable of generating inner speech from observations. A diffusion-based behavior cloning policy then selects actions conditioned on current observations and...