Yucheng low
Download
1 / 65

Yucheng Low - PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on

A Framework for Machine Learning and Data Mining in the Cloud . Yucheng Low. Joseph Gonzalez. Aapo Kyrola. Danny Bickson. Carlos Guestrin. Joe Hellerstein. Big Data is Everywhere. 900 Million Facebook Users. 28 Million Wikipedia Pages. 6 Billion Flickr Photos.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Yucheng Low' - fergal


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Yucheng low

A Framework for Machine Learning and Data Mining in the Cloud

Yucheng Low

Joseph

Gonzalez

Aapo

Kyrola

Danny

Bickson

Carlos

Guestrin

Joe

Hellerstein


Big data is everywhere
Big Data is Everywhere Cloud

900 Million

Facebook Users

28 Million

Wikipedia Pages

6 Billion

Flickr Photos

“…growing at 50 percent a year…”

“… data a new class of economic asset, like currency or gold.”

72 Hours a Minute

YouTube


How will we design and implement big learning systems

How will we Cloud design and implementBig learning systems?

Big Learning


Shift towards use of parallelism in ml
Shift Towards Use Of Parallelism in ML Cloud

  • ML experts repeatedly solve the same parallel design challenges:

    • Race conditions, distributed state, communication…

  • Resulting code is very specialized:

    • difficult to maintain, extend, debug…

Graduatestudents

GPUs

Multicore

Clusters

Clouds

Supercomputers

Avoid these problems by using

high-level abstractions


Mapreduce map phase
MapReduce Cloud – Map Phase

CPU 1

CPU 2

CPU 3

CPU 4


Mapreduce map phase1
MapReduce Cloud – Map Phase

4

2

.

3

2

1

.

3

2

5

.

8

CPU 1

1

2

.

9

CPU 2

CPU 3

CPU 4

Embarrassingly Parallel independent computation

No Communication needed


Mapreduce map phase2
MapReduce Cloud – Map Phase

8

4

.

3

1

8

.

4

8

4

.

4

CPU 1

2

4

.

1

CPU 2

CPU 3

CPU 4

1

2

.

9

4

2

.

3

2

1

.

3

2

5

.

8

Embarrassingly Parallel independent computation

No Communication needed


Mapreduce map phase3
MapReduce Cloud – Map Phase

6

7

.

5

1

4

.

9

3

4

.

3

CPU 1

1

7

.

5

CPU 2

CPU 3

CPU 4

8

4

.

3

1

8

.

4

8

4

.

4

1

2

.

9

2

4

.

1

4

2

.

3

2

1

.

3

2

5

.

8

Embarrassingly Parallel independent computation

No Communication needed


Mapreduce reduce phase
MapReduce Cloud – Reduce Phase

Attractive Face

Statistics

Ugly Face

Statistics

17

26

.

31

22

26

.

26

CPU 1

CPU 2

Attractive Faces

Ugly Faces

1

2

.

9

2

4

.

1

1

7

.

5

4

2

.

3

8

4

.

3

6

7

.

5

2

1

.

3

1

8

.

4

1

4

.

9

2

5

.

8

8

4

.

4

3

4

.

3

U A A U U U A A U A U A

Image Features


Mapreduce for data parallel ml
MapReduce Cloud for Data-Parallel ML

  • Excellent for large data-parallel tasks!

Data-Parallel Graph-Parallel

Is there more to

Machine Learning

?

MapReduce

Feature

Extraction

Cross

Validation

Graphical Models

Gibbs Sampling

Belief Propagation

Variational Opt.

Semi-Supervised Learning

Label Propagation

CoEM

Computing Sufficient

Statistics

Collaborative

Filtering

Tensor Factorization

Graph Analysis

PageRank

Triangle Counting



Hockey Cloud

Scuba Diving

Hockey

Underwater Hockey

Scuba Diving

Scuba Diving

Hockey

Hockey

Scuba Diving

Hockey


Graphs are everywhere
Graphs are Everywhere Cloud

Collaborative Filtering

Social Network

  • Netflix

Users

Movies

Probabilistic Analysis

Text Analysis

  • Wiki

Docs

Words


Properties of computation on graphs
Properties of Computation on Graphs Cloud

LocalUpdates

Iterative

Computation

Dependency

Graph

My Interests

Friends Interests


Ml tasks beyond data parallelism
ML Tasks Beyond Data-Parallelism Cloud

Data-Parallel Graph-Parallel

Map Reduce

Feature

Extraction

Cross

Validation

Graphical Models

Gibbs Sampling

Belief Propagation

Variational Opt.

Semi-Supervised Learning

Label Propagation

CoEM

Computing Sufficient

Statistics

Collaborative

Filtering

Tensor Factorization

Graph Analysis

PageRank

Triangle Counting


Alternating Least Cloud

Squares

SVD

Splash Sampler

CoEM

Bayesian Tensor

Factorization

Lasso

Belief Propagation

LDA

2010

SVM

PageRank

Shared Memory

Gibbs Sampling

Linear Solvers

Matrix

Factorization

…Many others…


Limited Cloud CPU Power

Limited Memory

Limited Scalability


Distributed cloud
Distributed Cloud Cloud

Unlimited amount of computation resources!

(up to funding limitations)

  • Distributing State

  • Data Consistency

  • Fault Tolerance


The graphlab framework
The Cloud GraphLab Framework

Graph Based

Data Representation

Update Functions

User Computation

Consistency Model


Data graph
Data Graph Cloud

Data associated with vertices and edges

  • Graph:

  • Social Network

  • Vertex Data:

  • User profile

  • Current interests estimates

  • Edge Data:

  • Relationship

  • (friend, classmate, relative)


Distributed graph
Distributed Graph Cloud

Partition the graph across multiple machines.


Distributed graph1
Distributed Graph Cloud

  • Ghost vertices maintain adjacency structure and replicate remote data.

“ghost” vertices


Distributed graph2
Distributed Graph Cloud

  • Cut efficiently using HPC Graph partitioning tools (ParMetis / Scotch / …)

“ghost” vertices


The graphlab framework1
The Cloud GraphLab Framework

Graph Based

Data Representation

Update Functions

User Computation

Consistency Model


Update functions
Update Functions Cloud

User-defined program: applied to avertex and transforms data in scopeof vertex

Update function applied (asynchronously)

in parallel until convergence

Many schedulers available to prioritize computation

Pagerank(scope){

// Update the current vertex data

// Reschedule Neighbors if needed

if vertex.PageRank changes then

reschedule_all_neighbors;

}

Why Dynamic?

Dynamic computation


Asynchronous belief propagation
Asynchronous Belief Propagation Cloud

Challenge = Boundaries

Many

Updates

Few

Updates

Cumulative Vertex Updates

Algorithm identifies and focuses

on hidden sequential structure

Graphical Model


Shared memory dynamic schedule
Shared Memory Dynamic Schedule Cloud

Scheduler

b

d

a

c

CPU 1

b

e

f

g

a

h

i

k

h

j

a

b

i

CPU 2

Process repeats until scheduler is empty


Distributed scheduling
Distributed Scheduling Cloud

Each machine maintains a schedule over the vertices it owns.

a

f

b

d

a

c

b

c

g

h

e

f

g

i

j

i

k

h

j

Distributed Consensus used to identify completion


Ensuring race free code
Ensuring Race-Free Code Cloud

How much can computation overlap?


The graphlab framework2
The Cloud GraphLab Framework

Graph Based

Data Representation

Update Functions

User Computation

Consistency Model


Pagerank revisited
PageRank Revisited Cloud

Pagerank(scope) {

}


Racing pagerank
Racing PageRank Cloud

This was actually encountered in user code.


Bugs Cloud

Pagerank(scope) {

}


Bugs Cloud

tmp

Pagerank(scope) {

}

tmp


Throughput performance
Throughput ! Cloud = Performance


Racing collaborative filtering
Racing Cloud Collaborative Filtering


Serializability
Serializability Cloud

For every parallel execution, there exists a sequential execution of update functions which produces the same result.

time

CPU 1

Parallel

CPU 2

Single

CPU

Sequential


Serializability example
Serializability Cloud Example

Write

Stronger / Weaker

consistency levels available

User-tunable consistency levels

trades off parallelism & consistency

Read

Overlapping regions

are only read.

Update functions onevertex apart can be run in parallel.

Edge Consistency


Solution 1

Distributed Consistency Cloud

Solution 1

Graph Coloring

Solution 2

Distributed Locking


Edge consistency via graph coloring
Edge Consistency via Graph Coloring Cloud

Vertices of the same color are all at least one vertex apart.

Therefore, All vertices of the same color can be run in parallel!


Chromatic distributed engine
Chromatic Distributed Engine Cloud

Execute tasks

on all vertices of

color 0

Execute tasks

on all vertices of

color 0

Ghost Synchronization Completion + Barrier

Time

Execute tasks

on all vertices of

color 1

Execute tasks

on all vertices of

color 1

Ghost Synchronization Completion + Barrier


Matrix factorization
Matrix Factorization Cloud

  • Netflix Collaborative Filtering

    • Alternating Least Squares Matrix Factorization

Model: 0.5 million nodes, 99 million edges

Movies

Users

Users

Netflix

d

Movies


Netflix collaborative filtering
Netflix Collaborative Filtering Cloud

MPI

Hadoop

GraphLab

Ideal

# machines

D=100

# machines

D=20


The cost of hadoop
The Cost of Cloud Hadoop


Coem rosie jones 2005
CoEM Cloud (Rosie Jones, 2005)

Named Entity Recognition Task

Is “Cat” an animal?

Is “Istanbul” a place?

Vertices: 2 Million

Edges: 200 Million

the cat

<X> ran quickly

Australia

travelled to <X>

0.3% of Hadoop time

Istanbul

<X> is pleasant


Problems
Problems Cloud

  • Require a graph coloring to be available.

  • Frequent Barriers make it extremely inefficient for highly dynamic systems where only a small number of vertices are active in each round.


Solution 11

Distributed Consistency Cloud

Solution 1

Graph Coloring

Solution 2

Distributed Locking


Distributed locking
Distributed Locking Cloud

Edge Consistency can be guaranteed through locking.

: RW Lock


Consistency through locking
Consistency Through Locking Cloud

Acquire write-lock on center vertex, read-lock on adjacent.


Consistency through locking1
Consistency Through Locking Cloud

Multicore Setting

Distributed Setting

Machine 1

Machine 2

CPU

A

C

  • Challenges

    • Latency

A

A

C

A

C

B

D

B

D

B

D

  • Solution

    • Pipelining

PThread RW-Locks

Distributed Locks


No pipelining
No Pipelining Cloud

lock scope 1

Process request 1

Time

scope 1 acquired

update_function 1

release scope 1

Process release 1


Pipelining latency hiding
Pipelining / Latency Hiding Cloud

Hide latency using pipelining

lock scope 1

lock scope 2

Process request 1

lock scope 3

Time

Process request 2

scope 1 acquired

Process request 3

scope 2 acquired

scope 3 acquired

update_function 1

release scope 1

update_function 2

Process release 1

release scope 2


Latency hiding
Latency Hiding Cloud

Hide latency using request buffering

Residual BP on 190K Vertex 560K Edge Graph

4 Machines

lock scope 1

lock scope 2

Process request 1

lock scope 3

Time

Process request 2

scope 1 acquired

47x Speedup

Process request 3

scope 2 acquired

scope 3 acquired

update_function 1

release scope 1

update_function 2

Process release 1

release scope 2


Video cosegmentation
Video Cloud Cosegmentation

Segments mean the same

Probabilistic Inference Task

1740 Frames

Model: 10.5 million nodes, 31 million edges


Video coseg speedups
Video Cloud Coseg. Speedups

Ideal

GraphLab

# machines


The graphlab framework3
The Cloud GraphLab Framework

Graph Based

Data Representation

Update Functions

User Computation

Consistency Model


What if machines fail? Cloud

How do we provide fault tolerance?


Checkpoint

Checkpoint Cloud

1: Stop the world

2: Write state to disk


Snapshot performance
Snapshot Performance Cloud

Because we have to stop the world,

One slow machine slows everything down!

No Snapshot

Snapshot

One slow machine

Snapshot time

Slow machine


How can we do better

How can we do better? Cloud

Take advantage of consistency


Checkpointing
Checkpointing Cloud

1985: Chandy-Lamport invented an asynchronous snapshotting algorithm for distributed systems.

snapshotted

Not snapshotted


Checkpointing1
Checkpointing Cloud

Fine Grained Chandy-Lamport.

Easily implemented within GraphLab as an Update Function!


Async snapshot performance
Async Cloud . Snapshot Performance

No penalty incurred by the slow machine!

No Snapshot

Snapshot

One slow machine


Summary
Summary Cloud

  • Extended GraphLab abstraction to distributed systems

  • Two different methods of achieving consistency

    • Graph Coloring

    • Distributed Locking with pipelining

  • Efficient implementations

  • Asynchronous Fault Tolerance with fined-grained Chandy-Lamport

Performance

Efficiency

Scalabilitys

Useability


Release 2 1 many toolkits

Release 2.1 Cloud + many toolkits

http://graphlab.org

Major Improvements to be published in OSDI 2012

Triangle Counting:

282x faster than Hadoop

400M triangles per second.

PageRank:

40x faster than Hadoop

1B edges per second.


ad