[2602.15473] POP: Prior-fitted Optimizer Policies
Summary
The paper introduces POP (Prior-fitted Optimizer Policies), a meta-learned optimization method that predicts step sizes based on contextual information, outperforming traditional optimizers in various settings.
Why It Matters
This research addresses the challenges of hyperparameter sensitivity in gradient-based optimizers, particularly in non-convex optimization problems. By leveraging meta-learning, POP enhances optimization efficiency and generalization, making it relevant for machine learning practitioners seeking robust solutions.
Key Takeaways
- POP predicts coordinate-wise step sizes using contextual optimization data.
- It outperforms traditional first-order and non-convex optimization methods.
- The model demonstrates strong generalization capabilities without task-specific tuning.
- POP is trained on a diverse set of synthetic optimization problems.
- This approach can significantly improve optimization tasks in machine learning.
Computer Science > Machine Learning arXiv:2602.15473 (cs) [Submitted on 17 Feb 2026] Title:POP: Prior-fitted Optimizer Policies Authors:Jan Kobiolka, Christian Frey, Gresa Shala, Arlind Kadra, Erind Bedalli, Josif Grabocka View a PDF of the paper titled POP: Prior-fitted Optimizer Policies, by Jan Kobiolka and 5 other authors View PDF HTML (experimental) Abstract:Optimization refers to the task of finding extrema of an objective function. Classical gradient-based optimizers are highly sensitive to hyperparameter choices. In highly non-convex settings their performance relies on carefully tuned learning rates, momentum, and gradient accumulation. To address these limitations, we introduce POP (Prior-fitted Optimizer Policies), a meta-learned optimizer that predicts coordinate-wise step sizes conditioned on the contextual information provided in the optimization trajectory. Our model is learned on millions of synthetic optimization problems sampled from a novel prior spanning both convex and non-convex objectives. We evaluate POP on an established benchmark including 47 optimization functions of various complexity, where it consistently outperforms first-order gradient-based methods, non-convex optimization approaches (e.g., evolutionary strategies), Bayesian optimization, and a recent meta-learned competitor under matched budget constraints. Our evaluation demonstrates strong generalization capabilities without task-specific tuning. Comments: Subjects: Machine Learning (cs....