1 / 29

Application of Reinforcement Learning in Network Routing

Application of Reinforcement Learning in Network Routing. By Chaopin Zhu. Machine Learning. Supervised Learning Unsupervised Learning Reinforcement Learning. Supervised Learning. Feature: Learning with a teacher Phases Training phase Testing phase Application Pattern recognition

ginak
Download Presentation

Application of Reinforcement Learning in Network Routing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application of Reinforcement Learning in Network Routing By Chaopin Zhu

  2. Machine Learning • Supervised Learning • Unsupervised Learning • Reinforcement Learning

  3. Supervised Learning • Feature: Learning with a teacher • Phases • Training phase • Testing phase • Application • Pattern recognition • Function approximation

  4. Unsupervised Leaning • Feature • Learning without a teacher • Application • Feature extraction • Other preprocessing

  5. Reinforcement Learning • Feature: Learning with a critic • Application • Optimization • Function approximation

  6. Elements ofReinforcement Learning • Agent • Environment • Policy • Reward function • Value function • Model of environment (optional)

  7. Reinforcement Learning Problem

  8. Markov Decision Process (MDP) Definition: A reinforcement learning task that satisfies the Markov property Transition probabilities

  9. An Example of MDP

  10. Markov Decision Process (cont.) • Parameters Value functions

  11. Elementary Methods forReinforcement Learning Problem • Dynamic programming • Monte Carlo Methods • Temporal-Difference Learning

  12. Bellman’s Equations

  13. Dynamic Programming Methods • Policy evaluation • Policy improvement

  14. Dynamic Programming (cont.) E ---- policy evaluation I ---- policy improvement • Policy Iteration • Value Iteration

  15. Monte Carlo Methods • Feature • Learning from experience • Do not need complete transition probabilities • Idea • Partition experience into episodes • Average sample return • Update at episode-by-episode base

  16. Temporal-Difference Learning • Features (Combination of Monte Carlo and DP ideas) • Learn from experience (Monte Carlo) • Update estimates based in part on other learned estimates (DP) • TD() algorithm seemlessly integrates TD and Monte Carlo Methods

  17. TD(0) Learning Initialize V(x) arbitrarily • to the policy to be evaluated Repeat (for each episode): Initialize x Repeat (for each step of episode) aaction given by  for x Take action a; observe reward r and next state x’ xx’ until x is terminal

  18. Q-Learning Initialize Q(x,a) arbitrarily Repeat (for each episode) Initialize x Repeat (for each step of episode): Choose a from x using policy derived from Q Take action a, observe r, x’ xx’ until x is terminal

  19. Q-Routing Qx(y,d)----estimated time that a packet would take to reach the destination node d from current node x via x’s neighbor node y Ty(d) ------y’s estimate for the time remaining in the trip qy ---------queuing time in node y Txy --------transmission time between x and y

  20. Algorithm of Q-Routing • Set initial Q-values for each node • Get the first packet from the packet queue of node x • Choose the best neighbor node and forward the packet to node by • Get the estimated value from node • Update • Go to 2.

  21. Dual Reinforcement Q-Routing

  22. Network Model

  23. Network Model (cont.)

  24. Node Model

  25. Routing Controller

  26. Initialization/ Termination Procedures • Initilization • Initialize and / or register global variable • Initialize routing table • Termination • Destroy routing table • Release memory

  27. Arrival Procedure • Data packet arrival • Update routing table • Route it with control information or destroy the packet if it reaches the destination • Control information packet arrival • Update routing table • Destroy the packet

  28. Departure Procedure • Set all fields of the packet • Get a shortest route • Send the packet according to the route

  29. References [1] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning—An Introduction [2] Chengan Guo, Applications of Reinforcement Learning in Sequence Detection and Network Routing [3] Simon Haykin, Neural Networks– A Comprehensive Foundation

More Related