Loading in 2 Seconds...

Distributed Reinforcement Learning for a Traffic Engineering Application

Loading in 2 Seconds...

- 63 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Distributed Reinforcement Learning for a Traffic Engineering Application' - denton-mercado

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Distributed Reinforcement Learning for a TrafficEngineering Application

Mark D. Pendrith

DaimlerChrysler Research & Technology CenterPresented by: Christina Schweikert

Distributed Reinforcement Learning for Traffic Engineering Problem

- Intelligent Cruise Control System
- Lane change advisory system based on traffic patterns
- Optimize a group policy by maximizing freeway utilization as shared resource
- Introduce 2 new algorithms (Monte Carlo-based Piecewise Policy Iteration, Multi-Agent Distributed Q-learning) and compare their performance in this domain

Distronic Adaptive Cruise Control

- Signals from radar sensor, which scans the full width of a three-lane motorway over a distance of approximately 100m and recognizes any moving vehicles ahead
- Reflection of the radar impulses and the change in their frequency enables the system to calculate the correct distance and the relative speed between the vehicles

Distronic Adaptive Cruise Control

- Distance to vehicle in front reduces - cruise control system immediately reduces acceleration or, if necessary, applies the brake
- Distance increases – acts as conventional cruise control system and, at speeds of between 30 and 180 km/h, will maintain the desired speed as programmed
- Driver is alerted of emergencies

Distronic Adaptive Cruise Control

- Automatically maintains a constant distance to the vehicle in front of it, prevent rear-end collisions
- Reaction time of drivers using Distronic is up to 40 per cent faster than that of those without this assistance system

Distributed Reinforcement Learning

- State– agents within sensing range
- Agents share a partially observable environment
- Goal - Integrate agents’ experiences to learn an observation-based policy that maximizes group performance
- Agents share a common policy, giving a homogeneous population of agents

Traffic Engineering Problem

- Population of cars, each with a desired traveling speed, sharing a freeway network
- Subpopulation with radar capability to detect relative speeds and distances of cars immediately ahead, behind, and around them

Problem Formulation

- Optimize average per time-step reward, by minimizing the per-car average loss at each time step

vd(i)desired speed of car i

va(i)actual speed of car i

n number of cars in simulation at time-step

State Representation

- View of the world for each car represented by 8-d feature vector – relative distances and speeds of surrounding cars

Pattern of Cars in Front of Agent

- 0 – lane is clear (no car in radar range or nearest car is faster than agent’s desired speed)
- 1 – fastest car less than desired speed
- 2 – slower
- 3 - still slower

Pattern of Cars Behind Agent

- 0 – lane is clear (no car in radar range or nearest car is slower than agent’s current speed)
- 1 – slowest car faster than desired speed
- 2 – faster
- 3 - still faster

Lane Change

- 0 – lane change not valid
- 1 – lane change valid

If there is not a safe gap in front and behind, land change is illegal.

Monte Carlo-based Piecewise Policy Iteration

- Performs approximate piecewise policy iteration where possible policy changes for each state are evaluated by Monte Carlo estimation
- Piecewise - Policy for each state is changed one at a time, rather than in parallel
- Searches the space of deterministic policies directly without representing the value function

Policy Iteration

- Start with arbitrary deterministic policy for given MDP
- Generate better policy by calculating best single improvement in policy possible for each state (MC)
- Combine all changes to generate successor policy
- Continue until no improvement is possible – optimal policy

Multi-Agent Distributed Q-Learning

Q-Learning

- Q-value estimates updated after each time step based on state transition after action is selected
- For each time step, only one state transition and one action used to update Q-value estimates
- In DQL, there can be as many state transitions per time step as there are agents

Multi-Agent Distributed Q-Learning

- Takes the average backup value for a state/action pair <s, a> over all agents that selected action a from state s at the last time step
- Qmaxcomponent of backup value is calculated over actions valid for a particular agent to select at the next time-step

Simulation for Offline Learning

Advantages:

o Since true state of the environment is known, can directly measure loss metric

o Can be run faster, many long learning trials

o Safety

Learn policies offline then integrate into intelligent cruise control system with lane advisory, route planning, etc.

Traffic Simulation Specifications

- Circular 3 lane freeway 13.3 miles long with 200 cars
- Half follow “selfish drone” policy
- Rest follow current learnt policy and active exploration decisions
- Gaussian distribution of desired speeds, mean of 60 mph
- Cars have low level collision avoidance, differ in lane change strategy

Experimental Results

- Selfish drone policy – consistent per-step reward of -11.9 (each agent traveling 11.9 below desired speed)
- APPIA and DQL found policies 3-5% better
- Best policies with “look ahead” only
- “look behind” model provided more stable learning
- “look behind” outperforms “look ahead” at times when good policy is lost

Download Presentation

Connecting to Server..