[2602.20728] Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback
Summary
This paper explores the use of reinforcement learning from AI feedback (RLAIF) to balance multiple objectives in urban traffic control, addressing the challenge of reward design in multi-objective settings.
Why It Matters
As urban traffic systems become increasingly complex, optimizing for multiple objectives is crucial for effective traffic management. This research offers a scalable solution by leveraging AI feedback to create balanced policies that align with user priorities, potentially improving urban mobility and safety.
Key Takeaways
- RLAIF can effectively manage multiple objectives in urban traffic control.
- The approach reduces the need for extensive reward engineering.
- Policies developed through RLAIF reflect user priorities and preferences.
- Integrating RLAIF into multi-objective RL can enhance scalability.
- This research addresses a significant gap in existing reinforcement learning applications.
Computer Science > Artificial Intelligence arXiv:2602.20728 (cs) [Submitted on 24 Feb 2026] Title:Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback Authors:Chenyang Zhao, Vinny Cahill, Ivana Dusparic View a PDF of the paper titled Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback, by Chenyang Zhao and 1 other authors View PDF HTML (experimental) Abstract:Reward design has been one of the central challenges for real world reinforcement learning (RL) deployment, especially in settings with multiple objectives. Preference-based RL offers an appealing alternative by learning from human preferences over pairs of behavioural outcomes. More recently, RL from AI feedback (RLAIF) has demonstrated that large language models (LLMs) can generate preference labels at scale, mitigating the reliance on human annotators. However, existing RLAIF work typically focuses only on single-objective tasks, leaving the open question of how RLAIF handles systems that involve multiple objectives. In such systems trade-offs among conflicting objectives are difficult to specify, and policies risk collapsing into optimizing for a dominant goal. In this paper, we explore the extension of the RLAIF paradigm to multi-objective self-adaptive systems. We show that multi-objective RLAIF can produce policies that yield balanced trade-offs reflecting different user priorities without laborious reward engineering...