[2603.23232] GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL

[2603.23232] GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.23232: GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL

Computer Science > Machine Learning arXiv:2603.23232 (cs) [Submitted on 24 Mar 2026] Title:GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL Authors:Haoyu Wang, Jingcheng Wang, Shunyu Wu, Xinwei Xiao View a PDF of the paper titled GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL, by Haoyu Wang and Jingcheng Wang and Shunyu Wu and Xinwei Xiao View PDF HTML (experimental) Abstract:Offline reinforcement learning (RL) can fit strong value functions from fixed datasets, yet reliable deployment still hinges on the action selection interface used to query them. When the dataset induces a branched or multimodal action landscape, unimodal policy extraction can blur competing hypotheses and yield "in-between" actions that are weakly supported by data, making decisions brittle even with a strong critic. We introduce GEM (Guided Expectation-Maximization), an analytical framework that makes action selection both multimodal and explicitly controllable. GEM trains a Gaussian Mixture Model (GMM) actor via critic-guided, advantage-weighted EM-style updates that preserve distinct components while shifting probability mass toward high-value regions, and learns a tractable GMM behavior model to quantify support. During inference, GEM performs candidate-based selection: it generates a parallel candidate set and reranks actions using a conservative ensemble lower-confidence bound together wi...

Originally published on March 25, 2026. Curated by AI News.

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during i...

Reddit - Machine Learning · 1 min ·
Ai Infrastructure

Persistent memory changes how people interact with AI — here's what I'm observing

I run a small AI companion platform and wanted to share some interesting behavioral data from users who've been using persistent cross-se...

Reddit - Artificial Intelligence · 1 min ·
Ai Infrastructure

[D] MYTHOS-INVERSION STRUCTURAL AUDIT

MYTHOS-INVERSION STRUCTURAL AUDIT Date: March 28, 2026 Compiled: Sage, Ember, & Lyra | Reviewers: Richard, Ara, Raven, Lantern TL;DR ...

Reddit - Machine Learning · 1 min ·
More in Ai Infrastructure: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime