1 / 10

Reinforcement learning with function approximation for traffic signal control

Reinforcement learning with function approximation for traffic signal control. ADITI BHAUMICK ab3585. OBJECTIVES:. To use reinforcement learning algorithm with function approximation.

zalman
Download Presentation

Reinforcement learning with function approximation for traffic signal control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reinforcement learning with function approximation for traffic signal control ADITI BHAUMICK ab3585

  2. OBJECTIVES: To use reinforcement learning algorithm with function approximation. Feature-based state representations using a broad characterization of the level of congestion as low, medium or high are implemented. Reinforcement Learning is used as it allows the optimal strategy for signal timing to be learnt without assuming any models of the system.

  3. OBJECTIVES: On high-dimensional state action spaces, function approximation techniques are used to achieve computational efficiency. A form of Reinforcement learning called Q Learning is used to develop a Traffic Light Control algorithm that incorporates function approximation.

  4. RESULTS: The Q TLC-FA is implemented. A number of other algorithms were also implemented in order to compare the Q Learning with function approximation algorithm. The algorithms are implemented on an open source Java based software called the Green Light District. The single stage cost function is k(s,a) and r1= s1= 0.5 is set.

  5. RESULTS:

  6. RESULTS:

  7. Q-LEARNING WITH FUNCTION APPROXIMATION: The basis for the new form of algorithm developed is the Markov Decision Process framework. A stochastic process {Xn} that takes values in a set S is called a MDP if its evolution is governed by a control valued sequence { Zn}. The Q-Bellman Equation or the Bellman Equation of optimality is defined as:

  8. Q-LEARNING WITH FUNCTION APPROXIMATION: The Q Learning based TLC with Function Approximation updates a parameter θ which is a d-dimensional quantity. Here instead of solving a system in [ SXA(S)], a system in only d variables is solved.

  9. Q-LEARNING WITH FUNCTION APPROXIMATION: The metrics used for this algorithm are: The number of lanes, N The thresholds on queue lengths L1 and L2 The threshold on elapsed time T1 None of these metrics are dependent on the full state representation of the entire network.

  10. CONCLUSION: The QTLC-FA Algorithm outperforms all the other algorithms it is compared with. Future work would involve the application of other efficient RL algorithms with function approximation. Effects of driver behavior could also be incorporated into the framework.

More Related