[2602.12296] Adaptive traffic signal control optimization using a novel road partition and multi-channel state representation method
Summary
This article presents a novel adaptive traffic signal control method utilizing Deep Q-Networks and Proximal Policy Optimization to enhance traffic flow through optimized signal timing.
Why It Matters
Efficient traffic management is crucial for urban planning and reducing congestion. This research introduces innovative techniques that can significantly improve traffic signal control, potentially leading to reduced waiting times and lower fuel consumption, thereby enhancing overall urban mobility.
Key Takeaways
- The proposed method integrates variable cell lengths and multi-channel state representation for traffic signal control.
- Simulation results indicate improved optimization performance compared to traditional fixed cell length approaches.
- Key metrics such as waiting time, speed, and fuel consumption are effectively normalized and prioritized in the reward function.
Electrical Engineering and Systems Science > Systems and Control arXiv:2602.12296 (eess) [Submitted on 1 Feb 2026] Title:Adaptive traffic signal control optimization using a novel road partition and multi-channel state representation method Authors:Maojiang Deng, Shoufeng Lu, Jiazhao Shi, Wen Zhang View a PDF of the paper titled Adaptive traffic signal control optimization using a novel road partition and multi-channel state representation method, by Maojiang Deng and 3 other authors View PDF HTML (experimental) Abstract:This study proposes a novel adaptive traffic signal control method leveraging a Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) to optimize signal timing by integrating variable cell length and multi-channel state representation. A road partition formula consisting of the sum of logarithmic and linear functions was proposed. The state variables are a vector composed of three channels: the number of vehicles, the average speed, and space occupancy. The set of available signal phases constitutes the action space, the selected phase is executed with a fixed green time. The reward function is formulated using the absolute values of key traffic state metrics - waiting time, speed, and fuel consumption. Each metric is normalized by a typical maximum value and assigned a weight that reflects its priority and optimization direction. The simulation results, using Sumo-TensorFlow-Python, demonstrate a cross-range transferability evaluation and show that ...