[2510.11103] A Primer on SO(3) Action Representations in Deep Reinforcement Learning
Summary
This paper explores SO(3) action representations in deep reinforcement learning, focusing on their implications for robotic control tasks and providing practical guidelines for implementation.
Why It Matters
Understanding SO(3) action representations is crucial for improving the performance of robotic systems in reinforcement learning. This research addresses the challenges posed by the geometry of SO(3) and offers insights that can enhance exploration, optimization, and training stability in robotic applications.
Key Takeaways
- SO(3) representations significantly affect exploration and optimization in reinforcement learning.
- Different action representations can introduce distinct constraints and failure modes.
- Representing actions as tangent vectors in the local frame yields the most reliable results.
- The paper provides empirical studies across standard continuous control algorithms.
- Implementation-ready guidelines for selecting rotation actions are distilled from the research.
Computer Science > Robotics arXiv:2510.11103 (cs) [Submitted on 13 Oct 2025 (v1), last revised 21 Feb 2026 (this version, v2)] Title:A Primer on SO(3) Action Representations in Deep Reinforcement Learning Authors:Martin Schuck, Sherif Samy, Angela P. Schoellig View a PDF of the paper titled A Primer on SO(3) Action Representations in Deep Reinforcement Learning, by Martin Schuck and 2 other authors View PDF HTML (experimental) Abstract:Many robotic control tasks require policies to act on orientations, yet the geometry of SO(3) makes this nontrivial. Because SO(3) admits no global, smooth, minimal parameterization, common representations such as Euler angles, quaternions, rotation matrices, and Lie algebra coordinates introduce distinct constraints and failure modes. While these trade-offs are well studied for supervised learning, their implications for actions in reinforcement learning remain unclear. We systematically evaluate SO(3) action representations across three standard continuous control algorithms, PPO, SAC, and TD3, under dense and sparse rewards. We compare how representations shape exploration, interact with entropy regularization, and affect training stability through empirical studies and analyze the implications of different projections for obtaining valid rotations from Euclidean network outputs. Across a suite of robotics benchmarks, we quantify the practical impact of these choices and distill simple, implementation-ready guidelines for selecting and us...