1 / 32

# Optimization via (too much?) Randomization - PowerPoint PPT Presentation

Optimization via (too much?) Randomization. Why parallelizing like crazy and being lazy can be good. Peter Richtarik. Optimization as Mountain Climbing. =. Extreme* Mountain Climbing. Optimization with Big Data. * in a billion dimensional space on a foggy day. Big Data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Optimization via (too much?) Randomization' - uttara

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Optimization via (too much?) Randomization

Why parallelizing like crazy and being lazy can be good

• Peter Richtarik

Optimization as Mountain Climbing

Extreme* Mountain Climbing

### Optimization with Big Data

* in a billion dimensional space on a foggy day

BIG Volume BIG Velocity BIG Variety

• digital images & videos

• transaction records

• government records

• health records

• defence

• internet activity (social media, wikipedia, ...)

• scientific measurements (physics, climate models, ...)

x2

x3

x0

x1

Randomized Parallel Coordinate Descent

holy grail

settle for this

start

(Truss Topology Design)

Western General Hospital

(Creutzfeldt-Jakob Disease)

Ministry of Defence

dstl lab

(Algorithms for Data Simplicity)

Royal Observatory

(Optimal Planet Growth)

Optimization as Lock Breaking

A function representing the “quality” of a combination

x =(x1, x2, x3, x4)

F(x) = F(x1, x2, x3, x4)

Setup: Combination maximizing F opens the lock

Optimization Problem: Find combination maximizing F

A System of Billion Locks with Shared Dials

x1

1) Nodes in the graph correspond to dials

x2

2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge

Lock

x4

xn

x3

# dials = n

= # locks

• Each lock j has its own quality function Fjdepending on the dials it owns

• However, it does NOT open when Fjis maximized

• The system of locks opens when

• is maximized

F = F1 + F2 + ... + Fn

F : RnR

1) Randomly select a lock

2) Randomly select a dial belonging to the lock

3) Adjust the value on the selected dial based only on the info corresponding to the selected lock

J1

IDLE

J2

IDLE

J3

IDLE

Processor 1

WASTEFUL

J4

IDLE

J5

J6

Processor 2

J7

J8

IDLE

J9

IDLE

Processor 3

time

J1

J2

J3

Processor 1

NO WASTE

J4

J5

J6

Processor 2

J7

J8

J9

Processor 3

time

# Processors

Average # of dials common between 2 locks

# Locks

Average # dials in a lock

Theory vs Reality

Why parallelizing like crazy and being lazy can be good?

Randomization

• Effectivity

• Tractability

• Efficiency

• Scalability (big data)

• Parallelism

• Distribution

• Asynchronicity

Parallelization

• Randomized Coordinate Descent

• P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv:1212.0873

[can solve a problem with 1 billion variables in 2 hours using 24 processors]

• P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions

[can be applied to optimize an unknown function]

• Both of the above

M. Takac, A.Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for support vector machines, ArXiv:1303.xxxx

Probability

HPC

Matrix Theory

Machine Learning