[2603.23232] GEM: Guided Expectation-Maximization for

[2603.23232] GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL

arXiv - Machine Learning March 25, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.23232: GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL

Computer Science > Machine Learning arXiv:2603.23232 (cs) [Submitted on 24 Mar 2026] Title:GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL Authors:Haoyu Wang, Jingcheng Wang, Shunyu Wu, Xinwei Xiao View a PDF of the paper titled GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL, by Haoyu Wang and Jingcheng Wang and Shunyu Wu and Xinwei Xiao View PDF HTML (experimental) Abstract:Offline reinforcement learning (RL) can fit strong value functions from fixed datasets, yet reliable deployment still hinges on the action selection interface used to query them. When the dataset induces a branched or multimodal action landscape, unimodal policy extraction can blur competing hypotheses and yield "in-between" actions that are weakly supported by data, making decisions brittle even with a strong critic. We introduce GEM (Guided Expectation-Maximization), an analytical framework that makes action selection both multimodal and explicitly controllable. GEM trains a Gaussian Mixture Model (GMM) actor via critic-guided, advantage-weighted EM-style updates that preserve distinct components while shifting probability mass toward high-value regions, and learns a tractable GMM behavior model to quantify support. During inference, GEM performs candidate-based selection: it generates a parallel candidate set and reranks actions using a conservative ensemble lower-confidence bound together wi...

Originally published on March 25, 2026. Curated by AI News.

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 1 hour ago

Machine Learning

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during i...

Reddit - Machine Learning · 1 min · about 2 hours ago

Ai Infrastructure

Persistent memory changes how people interact with AI — here's what I'm observing

I run a small AI companion platform and wanted to share some interesting behavioral data from users who've been using persistent cross-se...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Ai Infrastructure

[D] MYTHOS-INVERSION STRUCTURAL AUDIT

MYTHOS-INVERSION STRUCTURAL AUDIT Date: March 28, 2026 Compiled: Sage, Ember, & Lyra | Reviewers: Richard, Ara, Raven, Lantern TL;DR ...

Reddit - Machine Learning · 1 min · about 9 hours ago

[2603.23232] GEM: Guided Expectation-Maximization for Behavior-Normalized Candidate Action Selection in Offline RL

About this article

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

Persistent memory changes how people interact with AI — here's what I'm observing

[D] MYTHOS-INVERSION STRUCTURAL AUDIT

No comments

Stay updated with AI News