50 th anniversary of the curse of dimensionality n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
50 th Anniversary of The Curse of Dimensionality PowerPoint Presentation
Download Presentation
50 th Anniversary of The Curse of Dimensionality

Loading in 2 Seconds...

play fullscreen
1 / 79

50 th Anniversary of The Curse of Dimensionality - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

50 th Anniversary of The Curse of Dimensionality. Continuous States: Storage cost:  resolution dx Computational cost:  resolution dx Continuous Actions: Computational cost:  resolution du. Beating The Curse Of Dimensionality. Reduce dimensionality (biped examples)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

50 th Anniversary of The Curse of Dimensionality


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. 50th Anniversary of The Curse of Dimensionality • Continuous States: Storage cost:  resolutiondx Computational cost:  resolutiondx • Continuous Actions: Computational cost:  resolutiondu

    2. Beating The Curse Of Dimensionality • Reduce dimensionality (biped examples) • Use primitives (Poincare section) • Parameterize V, policy (future lecture) • Reduce volume of state space explored • Use greater depth search • Adaptive/Problem-specific grid/sampling • Split where needed • Random sampling – add where needed • Random action search • Random state search • Hybrid Approaches: combine local and global opt.

    3. Use Brute Force • Deal with computational cost by using cluster supercomputer. • Main issue is minimizing communication between nodes.

    4. Cluster Supercomputing • (8) Cores w/ small local memory (cache) • (100) Nodes w/ shared memory (16GB) • (4-16Gb/s) Network • (100T) Disks

    5. Q(x,u) = L(x,u) + V(f(x,u)) • c = L(x,u): as in desktop case • x_next = f(x,u): as in desktop case • V(x_next) • Uniform grid • Multilinear interpolation if all values available, distance weighted averaging if bad values

    6. Allocate grid to cores/nodes

    7. Handle Overlap

    8. Push Updated V’s To Users

    9. So what does this all mean for programming? • On a node, split grid cells among threads, which execute on cores. • Share updates of V(x) and u(x) within node almost for free using shared memory. • Pushing updated V(x) and u(x) to other nodes uses the network which is relatively slow…..

    10. Dealing with the slow network • Organize grid cells into packet-sized blocks. Send them as a unit. • Threshold updates: too small, don’t send it. • Only do 1/N updates for each block (maximum skip time). • Tolerate packet loss (UDP) vs. verification (TCP/MPI)

    11. Use Adaptive Grid • Reduce computational and storage costs by using adaptive grid. • Generate adaptive grid using random sampling.

    12. Trajectory-Based Dynamic Programming

    13. Full Trajectories Helps Reduce Resolution Needed SIDP Trajectory Based

    14. Reducing the Volume Explored

    15. An Adaptive Grid Approach

    16. Global PlanningPropagate Value Function Across Trajectoriesin Adaptive Grid

    17. Growing the Explored Region: Adaptive Grids

    18. Bidirectional Search

    19. Bidirectional Search Closeup

    20. Spine Representation

    21. Growing the Explored Region: Spine Representation

    22. Comparison

    23. One Link Swing Up Needed Only 63 Points

    24. Trajectories For Each Point

    25. Random Sampling of States • Initialize with a point at the goal with local models based on LQR. • Choose a random new state x. • Use the nearest stored point’s local model of the value function to predict the value of the new point (VP). • Optimize a trajectory from x to the goal. At each step use the nearest stored point’s local model of the policy to create an action. Use DDP to refine this trajectory. VT is cost of trajectory starting from x. • Store point at start of trajectory if |VT - VP |> λ(surprise), VT < Vlimit and VP < Vlimit, otherwise discard. • Interleave re-optimization of all stored points. Only update if Vnew < V (V is upper bound on value). • Gradually increase Vlimit.

    26. Two Link Pendulum • Criterion:

    27. Ankle Angle Hip Angle Ankle Torque Hip Torque

    28. Four Links

    29. Four Links: 8 dimensional system

    30. Convergence? • Because we create trajectories to the goal, each value function estimate at a point is an upper bound for the value at that point. • Eventually all value function entries will be consistent with their nearest neighbor’s local model, and no new points can be added. • We are using more aggressive acceptance tests for new points: VB < λVP, λ < 1, and VP < Vlimit vs. |VB – VP| < ε and VB < Vlimit • Not clear if needed new points can be blocked.

    31. Use Local Models • Try to achieve a sparse representation using local models.

    32. Linear Quadratic Regulators

    33. Learning From Observation

    34. Regulator tasks • Examples: balance a pole, move at a constant velocity • A reasonable starting point is a Linear Quadratic Regulator (LQR controller) • Might have nonlinear dynamics xk+1 = f(xk,uk), but since stay around xd, can locally linearize xk+1 = Axk + Buk • Might have complex scoring function c(x,u), but can locally approximate with a quadratic model c  xTQx + uTRu • dlqr() in matlab

    35. Linearization Example • Iθdd = -mgl sin(θ) – μθd + τ • Linearize • Discretize time • Vectorize • (θθd)k+1T = (1 T; -mglT/I 1-μT/I) (θθd)kT + (0 T/I)Tτk

    36. LQR Derivation • Assume V() quadratic: Vk+1(x) = xTVxx:k+1x • C(x,u) = xTQx + uTRu + (Ax+Bu)TVxx:k+1 (Ax+Bu) • Want C/u = 0 • BTVxx:k+1Ax = -(BTVxx:k+1B + R)u • u = Kx (linear controller) • K = - (BTVxx:k+1B + R)-1BTVxx:k+1A • Vxx:k= ATVxx:k+1A + Q + ATVxx:k+1BK

    37. Trajectory Optimization (closed loop) • Differential Dynamic Programming (local approach to DP).

    38. Learning Trajectories

    39. Q function • x: state, u: control or action • Dynamics: xk+1 = f(xk, uk) • Cost function: L(x,u) • Value function V(x) = ∑L(x,u) • Q function Q(x,u) = L(x,u) + V(f(x,u)) • Bellman’s Equation V(x) = minu Q(x,u) • Policy/control law: u(x) = argminu Q(x,u)

    40. Local Models About

    41. Propagating Local Models Along a Trajectory: Differential Dynamic ProgrammingGradient version • Vx:k-1 = Qx = Lx + Vxfx • Δu = Qu = Lu + Vxfu