Optimization via too much randomization
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Optimization via (too much?) Randomization PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Optimization via (too much?) Randomization. Why parallelizing like crazy and being lazy can be good. Peter Richtarik. Optimization as Mountain Climbing. =. Extreme* Mountain Climbing. Optimization with Big Data. * in a billion dimensional space on a foggy day. Big Data.

Download Presentation

Optimization via (too much?) Randomization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Optimization via too much randomization

Optimization via (too much?) Randomization

Why parallelizing like crazy and being lazy can be good

  • Peter Richtarik


Optimization as mountain climbing

Optimization as Mountain Climbing


Optimization with big data

=

Extreme* Mountain Climbing

Optimization with Big Data

* in a billion dimensional space on a foggy day


Big data

Big Data

BIG Volume BIG Velocity BIG Variety

  • digital images & videos

  • transaction records

  • government records

  • health records

  • defence

  • internet activity (social media, wikipedia, ...)

  • scientific measurements (physics, climate models, ...)


God s algorithm teleportation

God’s Algorithm = Teleportation


If you are not a god

If You Are Not a God...

x2

x3

x0

x1


Randomized parallel coordinate descent

Randomized Parallel Coordinate Descent

holy grail

settle for this

start


Optimization via too much randomization

Arup

(Truss Topology Design)

Western General Hospital

(Creutzfeldt-Jakob Disease)

Ministry of Defence

dstl lab

(Algorithms for Data Simplicity)

Royal Observatory

(Optimal Planet Growth)


Optimization as lock breaking

Optimization as Lock Breaking


A lock with 4 dials

A Lock with 4 Dials

A function representing the “quality” of a combination

x =(x1, x2, x3, x4)

F(x) = F(x1, x2, x3, x4)

Setup: Combination maximizing F opens the lock

Optimization Problem: Find combination maximizing F


Optimization algorithm

Optimization Algorithm


A system of billion locks with shared dials

A System of Billion Locks with Shared Dials

x1

1) Nodes in the graph correspond to dials

x2

2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge

Lock

x4

xn

x3

# dials = n

= # locks


How do we measure the quality of a combination

How do we Measure the Quality of a Combination?

  • Each lock j has its own quality function Fjdepending on the dials it owns

  • However, it does NOT open when Fjis maximized

  • The system of locks opens when

  • is maximized

F = F1 + F2 + ... + Fn

F : RnR


An algorithm with too much randomization

An Algorithm with (too much?) Randomization

1) Randomly select a lock

2) Randomly select a dial belonging to the lock

3) Adjust the value on the selected dial based only on the info corresponding to the selected lock


Synchronous parallelization

Synchronous Parallelization

J1

IDLE

J2

IDLE

J3

IDLE

Processor 1

WASTEFUL

J4

IDLE

J5

J6

Processor 2

J7

J8

IDLE

J9

IDLE

Processor 3

time


Crazy lock free parallelization

Crazy (Lock-Free) Parallelization

J1

J2

J3

Processor 1

NO WASTE

J4

J5

J6

Processor 2

J7

J8

J9

Processor 3

time


Crazy parallelization

Crazy Parallelization


Crazy parallelization1

Crazy Parallelization


Crazy parallelization2

Crazy Parallelization


Crazy parallelization3

Crazy Parallelization


Theoretical result

Theoretical Result

# Processors

Average # of dials common between 2 locks

# Locks

Average # dials in a lock


Computational insights

Computational Insights


Theory vs reality

Theory vs Reality


Why parallelizing like crazy and being lazy can be good

Why parallelizing like crazy and being lazy can be good?

Randomization

  • Effectivity

  • Tractability

  • Efficiency

  • Scalability (big data)

  • Parallelism

  • Distribution

  • Asynchronicity

Parallelization


Optimization methods for big data

Optimization Methods for Big Data

  • Randomized Coordinate Descent

    • P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv:1212.0873

      [can solve a problem with 1 billion variables in 2 hours using 24 processors]

  • Stochastic (Sub) Gradient Descent

    • P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions

      [can be applied to optimize an unknown function]

  • Both of the above

    M. Takac, A.Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for support vector machines, ArXiv:1303.xxxx


Final 2 slides

Final 2 Slides


Tools

Tools

Probability

HPC

Matrix Theory

Machine Learning


  • Login