1 / 32

Optimization via (too much?) Randomization

Optimization via (too much?) Randomization. Why parallelizing like crazy and being lazy can be good. Peter Richtarik. Optimization as Mountain Climbing. =. Extreme* Mountain Climbing. Optimization with Big Data. * in a billion dimensional space on a foggy day. Big Data.

uttara
Download Presentation

Optimization via (too much?) Randomization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization via (too much?) Randomization Why parallelizing like crazy and being lazy can be good • Peter Richtarik

  2. Optimization as Mountain Climbing

  3. = Extreme* Mountain Climbing Optimization with Big Data * in a billion dimensional space on a foggy day

  4. Big Data BIG Volume BIG Velocity BIG Variety • digital images & videos • transaction records • government records • health records • defence • internet activity (social media, wikipedia, ...) • scientific measurements (physics, climate models, ...)

  5. God’s Algorithm = Teleportation

  6. If You Are Not a God... x2 x3 x0 x1

  7. Randomized Parallel Coordinate Descent holy grail settle for this start

  8. Arup (Truss Topology Design) Western General Hospital (Creutzfeldt-Jakob Disease) Ministry of Defence dstl lab (Algorithms for Data Simplicity) Royal Observatory (Optimal Planet Growth)

  9. Optimization as Lock Breaking

  10. A Lock with 4 Dials A function representing the “quality” of a combination x =(x1, x2, x3, x4) F(x) = F(x1, x2, x3, x4) Setup: Combination maximizing F opens the lock Optimization Problem: Find combination maximizing F

  11. Optimization Algorithm

  12. A System of Billion Locks with Shared Dials x1 1) Nodes in the graph correspond to dials x2 2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge Lock x4 xn x3 # dials = n = # locks

  13. How do we Measure the Quality of a Combination? • Each lock j has its own quality function Fjdepending on the dials it owns • However, it does NOT open when Fjis maximized • The system of locks opens when • is maximized F = F1 + F2 + ... + Fn F : RnR

  14. An Algorithm with (too much?) Randomization 1) Randomly select a lock 2) Randomly select a dial belonging to the lock 3) Adjust the value on the selected dial based only on the info corresponding to the selected lock

  15. Synchronous Parallelization J1 IDLE J2 IDLE J3 IDLE Processor 1 WASTEFUL J4 IDLE J5 J6 Processor 2 J7 J8 IDLE J9 IDLE Processor 3 time

  16. Crazy (Lock-Free) Parallelization J1 J2 J3 Processor 1 NO WASTE J4 J5 J6 Processor 2 J7 J8 J9 Processor 3 time

  17. Crazy Parallelization

  18. Crazy Parallelization

  19. Crazy Parallelization

  20. Crazy Parallelization

  21. Theoretical Result # Processors Average # of dials common between 2 locks # Locks Average # dials in a lock

  22. Computational Insights

  23. Theory vs Reality

  24. Why parallelizing like crazy and being lazy can be good? Randomization • Effectivity • Tractability • Efficiency • Scalability (big data) • Parallelism • Distribution • Asynchronicity Parallelization

  25. Optimization Methods for Big Data • Randomized Coordinate Descent • P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv:1212.0873 [can solve a problem with 1 billion variables in 2 hours using 24 processors] • Stochastic (Sub) Gradient Descent • P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions [can be applied to optimize an unknown function] • Both of the above M. Takac, A.Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for support vector machines, ArXiv:1303.xxxx

  26. Final 2 Slides

  27. Tools Probability HPC Matrix Theory Machine Learning

More Related