Optimization via (too much?) Randomization

Optimization via (too much?) Randomization Why parallelizing like crazy and being lazy can be good • Peter Richtarik

Optimization as Mountain Climbing

= Extreme* Mountain Climbing Optimization with Big Data * in a billion dimensional space on a foggy day

Big Data BIG Volume BIG Velocity BIG Variety • digital images & videos • transaction records • government records • health records • defence • internet activity (social media, wikipedia, ...) • scientific measurements (physics, climate models, ...)

God’s Algorithm = Teleportation

If You Are Not a God... x2 x3 x0 x1

Randomized Parallel Coordinate Descent holy grail settle for this start

Arup (Truss Topology Design) Western General Hospital (Creutzfeldt-Jakob Disease) Ministry of Defence dstl lab (Algorithms for Data Simplicity) Royal Observatory (Optimal Planet Growth)

Optimization as Lock Breaking

A Lock with 4 Dials A function representing the “quality” of a combination x =(x1, x2, x3, x4) F(x) = F(x1, x2, x3, x4) Setup: Combination maximizing F opens the lock Optimization Problem: Find combination maximizing F

Optimization Algorithm

A System of Billion Locks with Shared Dials x1 1) Nodes in the graph correspond to dials x2 2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge Lock x4 xn x3 # dials = n = # locks

How do we Measure the Quality of a Combination? • Each lock j has its own quality function Fjdepending on the dials it owns • However, it does NOT open when Fjis maximized • The system of locks opens when • is maximized F = F1 + F2 + ... + Fn F : RnR

An Algorithm with (too much?) Randomization 1) Randomly select a lock 2) Randomly select a dial belonging to the lock 3) Adjust the value on the selected dial based only on the info corresponding to the selected lock

Synchronous Parallelization J1 IDLE J2 IDLE J3 IDLE Processor 1 WASTEFUL J4 IDLE J5 J6 Processor 2 J7 J8 IDLE J9 IDLE Processor 3 time

Crazy (Lock-Free) Parallelization J1 J2 J3 Processor 1 NO WASTE J4 J5 J6 Processor 2 J7 J8 J9 Processor 3 time

Crazy Parallelization

Theoretical Result # Processors Average # of dials common between 2 locks # Locks Average # dials in a lock

Computational Insights

Theory vs Reality

Why parallelizing like crazy and being lazy can be good? Randomization • Effectivity • Tractability • Efficiency • Scalability (big data) • Parallelism • Distribution • Asynchronicity Parallelization

Optimization Methods for Big Data • Randomized Coordinate Descent • P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv:1212.0873 [can solve a problem with 1 billion variables in 2 hours using 24 processors] • Stochastic (Sub) Gradient Descent • P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions [can be applied to optimize an unknown function] • Both of the above M. Takac, A.Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for support vector machines, ArXiv:1303.xxxx

Final 2 Slides

Tools Probability HPC Matrix Theory Machine Learning

Optimization via (too much?) Randomization

Optimization via (too much?) Randomization

Presentation Transcript

Wing Planform Optimization via an Adjoint Method

NOx Reduction via Combustion Optimization

Parameterized Model Order Reduction via Quasi-Convex Optimization

Process Improvement via Simulation Optimization

Stock Market Trading Via Stochastic Network Optimization

Manage Uncertainty in Chemicals via Inventory Optimization

Optimization of CRT via EKG Is simple better?

Distributed Stochastic Optimization via Correlated Scheduling

Distributed Cosegmentation via Submodular Optimization on Anisotropic Diffusion

Optimization via Search

Multi-point Wing Planform Optimization via Control Theory

Multifidelity Optimization Via Pattern Search and Space Mapping

Scrubbing Optimization via Availability Prediction (SOAP) for Reconfigurable Space Computing

Heat rate improvement and emission reduction via combustion optimization

Optimization via Communication Networks

Network Configuration and Management via Two-Phase Online Optimization

MTF Pharmacy Budget Optimization Via Formulary Management

CLUSTERING ALGORITHMS VIA FUNCTION OPTIMIZATION

Path Smoothing via Discrete Optimization

Delay Reduction via Lagrange Multipliers in Stochastic Network Optimization

Optimization via Search

via via