[2602.14322] Conformal Signal Temporal Logic for Robust Reinforcement Learning Control: A Case Study
Summary
This article explores the integration of Conformal Signal Temporal Logic (CSTL) in reinforcement learning (RL) for enhancing safety and robustness in aerospace control applications, demonstrating improved reliability in challenging environments.
Why It Matters
The study highlights the importance of formal specifications in RL, particularly in safety-critical domains like aerospace. By combining CSTL with RL, the research addresses the growing need for reliable autonomous systems that can perform under uncertain conditions, making it relevant for both academia and industry.
Key Takeaways
- CSTL enhances the safety and robustness of RL control in aerospace applications.
- The proposed conformal shield outperforms classical rule-based shields in maintaining performance under stress.
- Integrating formal specifications with data-driven RL can significantly improve reliability.
Computer Science > Machine Learning arXiv:2602.14322 (cs) [Submitted on 15 Feb 2026] Title:Conformal Signal Temporal Logic for Robust Reinforcement Learning Control: A Case Study Authors:Hani Beirami, M M Manjurul Islam View a PDF of the paper titled Conformal Signal Temporal Logic for Robust Reinforcement Learning Control: A Case Study, by Hani Beirami and 1 other authors View PDF HTML (experimental) Abstract:We investigate how formal temporal logic specifications can enhance the safety and robustness of reinforcement learning (RL) control in aerospace applications. Using the open source AeroBench F-16 simulation benchmark, we train a Proximal Policy Optimization (PPO) agent to regulate engine throttle and track commanded airspeed. The control objective is encoded as a Signal Temporal Logic (STL) requirement to maintain airspeed within a prescribed band during the final seconds of each maneuver. To enforce this specification at run time, we introduce a conformal STL shield that filters the RL agent's actions using online conformal prediction. We compare three settings: (i) PPO baseline, (ii) PPO with a classical rule-based STL shield, and (iii) PPO with the proposed conformal shield, under both nominal conditions and a severe stress scenario involving aerodynamic model mismatch, actuator rate limits, measurement noise, and mid-episode setpoint jumps. Experiments show that the conformal shield preserves STL satisfaction while maintaining near baseline performance and provi...