[2602.18277] PRISM: Parallel Reward Integration with Symmetry for MORL
Summary
The paper presents PRISM, a novel algorithm for Multi-Objective Reinforcement Learning (MORL) that addresses the challenges of heterogeneous objectives by integrating parallel reward channels with reflectional symmetry.
Why It Matters
This research is significant as it tackles the inefficiencies in learning from sparse long-horizon rewards in MORL. By introducing the PRISM algorithm, the authors provide a framework that enhances sample efficiency and policy generalization, which is crucial for advancing AI applications in complex environments.
Key Takeaways
- PRISM improves learning efficiency in MORL by addressing reward heterogeneity.
- The algorithm employs reflectional symmetry to align reward channels effectively.
- Results show significant performance gains over traditional methods and dense reward baselines.
- The introduction of ReSymNet and SymReg enhances exploration and policy search.
- PRISM achieves over 100% hypervolume gains compared to sparse-reward baselines.
Computer Science > Machine Learning arXiv:2602.18277 (cs) [Submitted on 20 Feb 2026] Title:PRISM: Parallel Reward Integration with Symmetry for MORL Authors:Finn van der Knaap, Kejiang Qian, Zheng Xu, Fengxiang He View a PDF of the paper titled PRISM: Parallel Reward Integration with Symmetry for MORL, by Finn van der Knaap and 3 other authors View PDF HTML (experimental) Abstract:This work studies heterogeneous Multi-Objective Reinforcement Learning (MORL), where objectives can differ sharply in temporal frequency. Such heterogeneity allows dense objectives to dominate learning, while sparse long-horizon rewards receive weak credit assignment, leading to poor sample efficiency. We propose a Parallel Reward Integration with Symmetry (PRISM) algorithm that enforces reflectional symmetry as an inductive bias in aligning reward channels. PRISM introduces ReSymNet, a theory-motivated model that reconciles temporal-frequency mismatches across objectives, using residual blocks to learn a scaled opportunity value that accelerates exploration while preserving the optimal policy. We also propose SymReg, a reflectional equivariance regulariser that enforces agent mirroring and constrains policy search to a reflection-equivariant subspace. This restriction provably reduces hypothesis complexity and improves generalisation. Across MuJoCo benchmarks, PRISM consistently outperforms both a sparse-reward baseline and an oracle trained with full dense rewards, improving Pareto coverage and...