This presentation is the property of its rightful owner.
1 / 32

# Optimization via (too much?) Randomization PowerPoint PPT Presentation

Optimization via (too much?) Randomization. Why parallelizing like crazy and being lazy can be good. Peter Richtarik. Optimization as Mountain Climbing. =. Extreme* Mountain Climbing. Optimization with Big Data. * in a billion dimensional space on a foggy day. Big Data.

Optimization via (too much?) Randomization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

### Optimization via (too much?) Randomization

Why parallelizing like crazy and being lazy can be good

• Peter Richtarik

### Optimization as Mountain Climbing

=

Extreme* Mountain Climbing

## Optimization with Big Data

* in a billion dimensional space on a foggy day

### Big Data

BIG Volume BIG Velocity BIG Variety

• digital images & videos

• transaction records

• government records

• health records

• defence

• internet activity (social media, wikipedia, ...)

• scientific measurements (physics, climate models, ...)

x2

x3

x0

x1

### Randomized Parallel Coordinate Descent

holy grail

settle for this

start

Arup

(Truss Topology Design)

Western General Hospital

(Creutzfeldt-Jakob Disease)

Ministry of Defence

dstl lab

(Algorithms for Data Simplicity)

Royal Observatory

(Optimal Planet Growth)

### A Lock with 4 Dials

A function representing the “quality” of a combination

x =(x1, x2, x3, x4)

F(x) = F(x1, x2, x3, x4)

Setup: Combination maximizing F opens the lock

Optimization Problem: Find combination maximizing F

### A System of Billion Locks with Shared Dials

x1

1) Nodes in the graph correspond to dials

x2

2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge

Lock

x4

xn

x3

# dials = n

= # locks

### How do we Measure the Quality of a Combination?

• Each lock j has its own quality function Fjdepending on the dials it owns

• However, it does NOT open when Fjis maximized

• The system of locks opens when

• is maximized

F = F1 + F2 + ... + Fn

F : RnR

### An Algorithm with (too much?) Randomization

1) Randomly select a lock

2) Randomly select a dial belonging to the lock

3) Adjust the value on the selected dial based only on the info corresponding to the selected lock

J1

IDLE

J2

IDLE

J3

IDLE

Processor 1

WASTEFUL

J4

IDLE

J5

J6

Processor 2

J7

J8

IDLE

J9

IDLE

Processor 3

time

J1

J2

J3

Processor 1

NO WASTE

J4

J5

J6

Processor 2

J7

J8

J9

Processor 3

time

### Theoretical Result

# Processors

Average # of dials common between 2 locks

# Locks

Average # dials in a lock

### Why parallelizing like crazy and being lazy can be good?

Randomization

• Effectivity

• Tractability

• Efficiency

• Scalability (big data)

• Parallelism

• Distribution

• Asynchronicity

Parallelization

### Optimization Methods for Big Data

• Randomized Coordinate Descent

• P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv:1212.0873

[can solve a problem with 1 billion variables in 2 hours using 24 processors]

• Stochastic (Sub) Gradient Descent

• P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions

[can be applied to optimize an unknown function]

• Both of the above

M. Takac, A.Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for support vector machines, ArXiv:1303.xxxx

Probability

HPC

Matrix Theory

Machine Learning