[2602.20730] Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm
Summary
The paper presents ECO, a new learning paradigm for Neural Combinatorial Optimization that enhances efficiency through offline self-play, addressing limitations of existing methods.
Why It Matters
This research is significant as it proposes a novel approach to improve the efficiency of Neural Combinatorial Solvers, which are crucial in various optimization tasks. By shifting to an offline paradigm, it aims to reduce resource consumption while maintaining competitive performance, making it relevant for both academic research and practical applications in machine learning.
Key Takeaways
- ECO introduces a two-phase offline paradigm for Neural Combinatorial Optimization.
- The architecture is designed to enhance efficiency in offline learning.
- Progressive Bootstrapping is used to stabilize training and ensure continuous improvement.
- ECO shows competitive performance on TSP and CVRP benchmarks.
- The study includes in-depth analysis and ablation studies to support design choices.
Computer Science > Machine Learning arXiv:2602.20730 (cs) [Submitted on 24 Feb 2026] Title:Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm Authors:Zhenxing Xu, Zeyuan Ma, Weidong Bao, Hui Yan, Yan Zheng, Ji Wang View a PDF of the paper titled Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm, by Zhenxing Xu and 5 other authors View PDF HTML (experimental) Abstract:We propose ECO, a versatile learning paradigm that enables efficient offline self-play for Neural Combinatorial Optimization (NCO). ECO addresses key limitations in the field through: 1) Paradigm Shift: Moving beyond inefficient online paradigms, we introduce a two-phase offline paradigm consisting of supervised warm-up and iterative Direct Preference Optimization (DPO); 2) Architecture Shift: We deliberately design a Mamba-based architecture to further enhance the efficiency in the offline paradigm; and 3) Progressive Bootstrapping: To stabilize training, we employ a heuristic-based bootstrapping mechanism that ensures continuous policy improvement during training. Comparison results on TSP and CVRP highlight that ECO performs competitively with up-to-date baselines, with significant advantage on the efficiency side in terms of memory utilization and training throughput. We provide further in-depth analysis on the efficiency, throughput and memory usage of ECO. Ablation studies show rationale behind our designs. Subjects: Machine L...