distributed reinforcement learning for a traffic engineering application
Download
Skip this Video
Download Presentation
Distributed Reinforcement Learning for a Traffic Engineering Application

Loading in 2 Seconds...

play fullscreen
1 / 20

Distributed Reinforcement Learning for a Traffic Engineering Application - PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on

Distributed Reinforcement Learning for a Traffic Engineering Application. Mark D. Pendrith DaimlerChrysler Research & Technology Center Presented by: Christina Schweikert. Distributed Reinforcement Learning for Traffic Engineering Problem. Intelligent Cruise Control System

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Distributed Reinforcement Learning for a Traffic Engineering Application' - denton-mercado


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
distributed reinforcement learning for a traffic engineering application

Distributed Reinforcement Learning for a TrafficEngineering Application

Mark D. Pendrith

DaimlerChrysler Research & Technology CenterPresented by: Christina Schweikert

distributed reinforcement learning for traffic engineering problem
Distributed Reinforcement Learning for Traffic Engineering Problem
  • Intelligent Cruise Control System
  • Lane change advisory system based on traffic patterns
  • Optimize a group policy by maximizing freeway utilization as shared resource
  • Introduce 2 new algorithms (Monte Carlo-based Piecewise Policy Iteration, Multi-Agent Distributed Q-learning) and compare their performance in this domain
distronic adaptive cruise control1
Distronic Adaptive Cruise Control
  • Signals from radar sensor, which scans the full width of a three-lane motorway over a distance of approximately 100m and recognizes any moving vehicles ahead
  • Reflection of the radar impulses and the change in their frequency enables the system to calculate the correct distance and the relative speed between the vehicles
distronic adaptive cruise control2
Distronic Adaptive Cruise Control
  • Distance to vehicle in front reduces - cruise control system immediately reduces acceleration or, if necessary, applies the brake
  • Distance increases – acts as conventional cruise control system and, at speeds of between 30 and 180 km/h, will maintain the desired speed as programmed
  • Driver is alerted of emergencies
distronic adaptive cruise control3
Distronic Adaptive Cruise Control
  • Automatically maintains a constant distance to the vehicle in front of it, prevent rear-end collisions
  • Reaction time of drivers using Distronic is up to 40 per cent faster than that of those without this assistance system
distributed reinforcement learning
Distributed Reinforcement Learning
  • State– agents within sensing range
  • Agents share a partially observable environment
  • Goal - Integrate agents’ experiences to learn an observation-based policy that maximizes group performance
  • Agents share a common policy, giving a homogeneous population of agents
traffic engineering problem
Traffic Engineering Problem
  • Population of cars, each with a desired traveling speed, sharing a freeway network
  • Subpopulation with radar capability to detect relative speeds and distances of cars immediately ahead, behind, and around them
problem formulation
Problem Formulation
  • Optimize average per time-step reward, by minimizing the per-car average loss at each time step

vd(i)desired speed of car i

va(i)actual speed of car i

n number of cars in simulation at time-step

state representation
State Representation
  • View of the world for each car represented by 8-d feature vector – relative distances and speeds of surrounding cars
pattern of cars in front of agent
Pattern of Cars in Front of Agent
  • 0 – lane is clear (no car in radar range or nearest car is faster than agent’s desired speed)
  • 1 – fastest car less than desired speed
  • 2 – slower
  • 3 - still slower
pattern of cars behind agent
Pattern of Cars Behind Agent
  • 0 – lane is clear (no car in radar range or nearest car is slower than agent’s current speed)
  • 1 – slowest car faster than desired speed
  • 2 – faster
  • 3 - still faster
lane change
Lane Change
  • 0 – lane change not valid
  • 1 – lane change valid

If there is not a safe gap in front and behind, land change is illegal.

monte carlo based piecewise policy iteration
Monte Carlo-based Piecewise Policy Iteration
  • Performs approximate piecewise policy iteration where possible policy changes for each state are evaluated by Monte Carlo estimation
  • Piecewise - Policy for each state is changed one at a time, rather than in parallel
  • Searches the space of deterministic policies directly without representing the value function
policy iteration
Policy Iteration
  • Start with arbitrary deterministic policy for given MDP
  • Generate better policy by calculating best single improvement in policy possible for each state (MC)
  • Combine all changes to generate successor policy
  • Continue until no improvement is possible – optimal policy
multi agent distributed q learning
Multi-Agent Distributed Q-Learning

Q-Learning

  • Q-value estimates updated after each time step based on state transition after action is selected
  • For each time step, only one state transition and one action used to update Q-value estimates
  • In DQL, there can be as many state transitions per time step as there are agents
multi agent distributed q learning1
Multi-Agent Distributed Q-Learning
  • Takes the average backup value for a state/action pair <s, a> over all agents that selected action a from state s at the last time step
  • Qmaxcomponent of backup value is calculated over actions valid for a particular agent to select at the next time-step
simulation for offline learning
Simulation for Offline Learning

Advantages:

o Since true state of the environment is known, can directly measure loss metric

o Can be run faster, many long learning trials

o Safety

Learn policies offline then integrate into intelligent cruise control system with lane advisory, route planning, etc.

traffic simulation specifications
Traffic Simulation Specifications
  • Circular 3 lane freeway 13.3 miles long with 200 cars
  • Half follow “selfish drone” policy
  • Rest follow current learnt policy and active exploration decisions
  • Gaussian distribution of desired speeds, mean of 60 mph
  • Cars have low level collision avoidance, differ in lane change strategy
experimental results
Experimental Results
  • Selfish drone policy – consistent per-step reward of -11.9 (each agent traveling 11.9 below desired speed)
  • APPIA and DQL found policies 3-5% better
  • Best policies with “look ahead” only
  • “look behind” model provided more stable learning
  • “look behind” outperforms “look ahead” at times when good policy is lost
ad