[2511.14565] Masked IRL: LLM-Guided Reward Disambiguation from

[2511.14565] Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language

arXiv - AI April 01, 2026 4 min read

About this article

Abstract page for arXiv paper 2511.14565: Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language

Computer Science > Robotics arXiv:2511.14565 (cs) [Submitted on 18 Nov 2025 (v1), last revised 30 Mar 2026 (this version, v2)] Title:Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language Authors:Minyoung Hwang, Alexandra Forsey-Smerek, Nathaniel Dennler, Andreea Bobu View a PDF of the paper titled Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language, by Minyoung Hwang and 3 other authors View PDF HTML (experimental) Abstract:Robots can adapt to user preferences by learning reward functions from demonstrations, but with limited data, reward models often overfit to spurious correlations and fail to generalize. This happens because demonstrations show robots how to do a task but not what matters for that task, causing the model to focus on irrelevant state details. Natural language can more directly specify what the robot should focus on, and, in principle, disambiguate between many reward functions consistent with the demonstrations. However, existing language-conditioned reward learning methods typically treat instructions as simple conditioning signals, without fully exploiting their potential to resolve ambiguity. Moreover, real instructions are often ambiguous themselves, so naive conditioning is unreliable. Our key insight is that these two input types carry complementary information: demonstrations show how to act, while language specifies what is important. We propose Masked Inverse Reinforcement Learning (Masked IRL), ...

Originally published on April 01, 2026. Curated by AI News.

Llms

Agents Can Now Propose and Deploy Their Own Code Changes

150 clones yesterday. 43 stars in 3 days. Every agent framework you've used (LangChain, LangGraph, Claude Code) assumes agents are tools ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

[2603.17839] How do LLMs Compute Verbal Confidence

Abstract page for arXiv paper 2603.17839: How do LLMs Compute Verbal Confidence

arXiv - AI · 4 min · about 3 hours ago

Llms

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Abstract page for arXiv paper 2603.15970: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight...

arXiv - AI · 4 min · about 3 hours ago

Llms

[2603.10062] Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

Abstract page for arXiv paper 2603.10062: Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

arXiv - AI · 3 min · about 3 hours ago

[2511.14565] Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language

About this article

Related Articles

Agents Can Now Propose and Deploy Their Own Code Changes

[2603.17839] How do LLMs Compute Verbal Confidence

[2603.15970] 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

[2603.10062] Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

No comments

Stay updated with AI News