- 104 Views
- Uploaded on
- Presentation posted in: General

Optimization via (too much?) Randomization

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Why parallelizing like crazy and being lazy can be good

- Peter Richtarik

=

Extreme* Mountain Climbing

Optimization with Big Data

* in a billion dimensional space on a foggy day

BIG Volume BIG Velocity BIG Variety

- digital images & videos
- transaction records
- government records
- health records
- defence
- internet activity (social media, wikipedia, ...)
- scientific measurements (physics, climate models, ...)

x2

x3

x0

x1

holy grail

settle for this

start

Arup

(Truss Topology Design)

Western General Hospital

(Creutzfeldt-Jakob Disease)

Ministry of Defence

dstl lab

(Algorithms for Data Simplicity)

Royal Observatory

(Optimal Planet Growth)

A function representing the “quality” of a combination

x =(x1, x2, x3, x4)

F(x) = F(x1, x2, x3, x4)

Setup: Combination maximizing F opens the lock

Optimization Problem: Find combination maximizing F

x1

1) Nodes in the graph correspond to dials

x2

2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge

Lock

x4

xn

x3

# dials = n

= # locks

- Each lock j has its own quality function Fjdepending on the dials it owns
- However, it does NOT open when Fjis maximized
- The system of locks opens when
- is maximized

F = F1 + F2 + ... + Fn

F : RnR

1) Randomly select a lock

2) Randomly select a dial belonging to the lock

3) Adjust the value on the selected dial based only on the info corresponding to the selected lock

J1

IDLE

J2

IDLE

J3

IDLE

Processor 1

WASTEFUL

J4

IDLE

J5

J6

Processor 2

J7

J8

IDLE

J9

IDLE

Processor 3

time

J1

J2

J3

Processor 1

NO WASTE

J4

J5

J6

Processor 2

J7

J8

J9

Processor 3

time

# Processors

Average # of dials common between 2 locks

# Locks

Average # dials in a lock

Randomization

- Effectivity
- Tractability
- Efficiency
- Scalability (big data)
- Parallelism
- Distribution
- Asynchronicity

Parallelization

- Randomized Coordinate Descent
- P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv:1212.0873
[can solve a problem with 1 billion variables in 2 hours using 24 processors]

- P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv:1212.0873
- Stochastic (Sub) Gradient Descent
- P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions
[can be applied to optimize an unknown function]

- P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions
- Both of the above
M. Takac, A.Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for support vector machines, ArXiv:1303.xxxx

Probability

HPC

Matrix Theory

Machine Learning