Optimization via too much randomization
Download
1 / 32

Optimization via (too much?) Randomization - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

Optimization via (too much?) Randomization. Why parallelizing like crazy and being lazy can be good. Peter Richtarik. Optimization as Mountain Climbing. =. Extreme* Mountain Climbing. Optimization with Big Data. * in a billion dimensional space on a foggy day. Big Data.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Optimization via (too much?) Randomization' - uttara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Optimization via too much randomization
Optimization via (too much?) Randomization

Why parallelizing like crazy and being lazy can be good

  • Peter Richtarik


Optimization as mountain climbing
Optimization as Mountain Climbing


Optimization with big data

=

Extreme* Mountain Climbing

Optimization with Big Data

* in a billion dimensional space on a foggy day


Big data
Big Data

BIG Volume BIG Velocity BIG Variety

  • digital images & videos

  • transaction records

  • government records

  • health records

  • defence

  • internet activity (social media, wikipedia, ...)

  • scientific measurements (physics, climate models, ...)




Randomized parallel coordinate descent
Randomized Parallel Coordinate Descent

holy grail

settle for this

start


Arup

(Truss Topology Design)

Western General Hospital

(Creutzfeldt-Jakob Disease)

Ministry of Defence

dstl lab

(Algorithms for Data Simplicity)

Royal Observatory

(Optimal Planet Growth)


Optimization as lock breaking
Optimization as Lock Breaking


A lock with 4 dials
A Lock with 4 Dials

A function representing the “quality” of a combination

x =(x1, x2, x3, x4)

F(x) = F(x1, x2, x3, x4)

Setup: Combination maximizing F opens the lock

Optimization Problem: Find combination maximizing F



A system of billion locks with shared dials
A System of Billion Locks with Shared Dials

x1

1) Nodes in the graph correspond to dials

x2

2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge

Lock

x4

xn

x3

# dials = n

= # locks


How do we measure the quality of a combination
How do we Measure the Quality of a Combination?

  • Each lock j has its own quality function Fjdepending on the dials it owns

  • However, it does NOT open when Fjis maximized

  • The system of locks opens when

  • is maximized

F = F1 + F2 + ... + Fn

F : RnR


An algorithm with too much randomization
An Algorithm with (too much?) Randomization

1) Randomly select a lock

2) Randomly select a dial belonging to the lock

3) Adjust the value on the selected dial based only on the info corresponding to the selected lock


Synchronous parallelization
Synchronous Parallelization

J1

IDLE

J2

IDLE

J3

IDLE

Processor 1

WASTEFUL

J4

IDLE

J5

J6

Processor 2

J7

J8

IDLE

J9

IDLE

Processor 3

time


Crazy lock free parallelization
Crazy (Lock-Free) Parallelization

J1

J2

J3

Processor 1

NO WASTE

J4

J5

J6

Processor 2

J7

J8

J9

Processor 3

time






Theoretical result
Theoretical Result

# Processors

Average # of dials common between 2 locks

# Locks

Average # dials in a lock



Theory vs reality
Theory vs Reality


Why parallelizing like crazy and being lazy can be good
Why parallelizing like crazy and being lazy can be good?

Randomization

  • Effectivity

  • Tractability

  • Efficiency

  • Scalability (big data)

  • Parallelism

  • Distribution

  • Asynchronicity

Parallelization


Optimization methods for big data
Optimization Methods for Big Data

  • Randomized Coordinate Descent

    • P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv:1212.0873

      [can solve a problem with 1 billion variables in 2 hours using 24 processors]

  • Stochastic (Sub) Gradient Descent

    • P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions

      [can be applied to optimize an unknown function]

  • Both of the above

    M. Takac, A.Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for support vector machines, ArXiv:1303.xxxx



Tools
Tools

Probability

HPC

Matrix Theory

Machine Learning


ad