[2511.09219] Planning in Branch-and-Bound: Model-Based Reinforcement

[2511.09219] Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization

arXiv - Machine Learning April 03, 2026 3 min read

About this article

Abstract page for arXiv paper 2511.09219: Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization

Computer Science > Machine Learning arXiv:2511.09219 (cs) [Submitted on 12 Nov 2025 (v1), last revised 2 Apr 2026 (this version, v4)] Title:Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization Authors:Paul Strang, Zacharie Alès, Côme Bissuel, Olivier Juan, Safia Kedad-Sidhoum, Emmanuel Rachelson View a PDF of the paper titled Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization, by Paul Strang and 5 other authors View PDF HTML (experimental) Abstract:Mixed-Integer Linear Programming (MILP) lies at the core of many real-world combinatorial optimization (CO) problems, traditionally solved by branch-and-bound (B&B). A key driver influencing B&B solvers efficiency is the variable selection heuristic that guides branching decisions. Looking to move beyond static, hand-crafted heuristics, recent work has explored adapting traditional reinforcement learning (RL) algorithms to the B&B setting, aiming to learn branching strategies tailored to specific MILP distributions. In parallel, RL agents have achieved remarkable success in board games, a very specific type of combinatorial problems, by leveraging environment simulators to plan via Monte Carlo Tree Search (MCTS). Building on these developments, we introduce Plan-and-Branch-and-Bound (PlanB&B), a model-based reinforcement learning (MBRL) agent that leverages a learned internal model of the B&B dynamics to discover improved branchin...

Originally published on April 03, 2026. Curated by AI News.

Machine Learning

Anthropic’s Mythos rollout has missed America’s cybersecurity agency | The Verge

The Cybersecurity and Infrastructure Security Agency (CISA) doesn’t have access to Anthropic’s Mythos Preview, Axios reported.

The Verge - AI · 5 min · 43 minutes ago

Machine Learning

How do you anonymize code for a conference submission? [D]

Hi everyone, I have a question about anonymizing code for conference submissions. I’m submitting an AI/ML paper to a conference and would...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

Now Meta will track what employees do on their computers to train its AI agents | The Verge

Meta is reportedly using tracking software to record its employees’ mouse and keyboard activity for training data for its AI agents.

The Verge - AI · 4 min · about 3 hours ago

Llms

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

TL;DR. I ran a blind A/B preference evaluation between two 1.2B-parameter LMs trained on identical data (same order, same seed, 30K steps...

Reddit - Machine Learning · 1 min · about 4 hours ago

[2511.09219] Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization

About this article

Related Articles

Anthropic’s Mythos rollout has missed America’s cybersecurity agency | The Verge

How do you anonymize code for a conference submission? [D]

Now Meta will track what employees do on their computers to train its AI agents | The Verge

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

No comments

Stay updated with AI News