1 / 90

Carlos Guestrin

2. Parallel Machine Learning for Large-Scale Natural Graphs. Carlos Guestrin. The GraphLab Team:. Yucheng Low. Joseph Gonzalez. Aapo Kyrola. Danny Bickson. Joe Hellerstein. Jay Gu. Alex Smola. Parallelism is Difficult. Wide array of different parallel architectures:

elpida
Download Presentation

Carlos Guestrin

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2 Parallel Machine Learning for Large-Scale Natural Graphs Carlos Guestrin The GraphLab Team: Yucheng Low Joseph Gonzalez Aapo Kyrola Danny Bickson Joe Hellerstein Jay Gu Alex Smola

  2. Parallelism is Difficult • Wide array of different parallel architectures: • Different challenges for each architecture GPUs Multicore Clusters Clouds Supercomputers High Level Abstractions to make things easier

  3. How will wedesign and implementparallel learning systems?

  4. ... a popular answer: Map-Reduce / Hadoop Build learning algorithms on-top of high-level parallel abstractions

  5. Map-Reduce for Data-Parallel ML • Excellent for large data-parallel tasks! Data-Parallel Graph-Parallel Map Reduce Label Propagation Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

  6. Example of Graph Parallelism

  7. PageRank Example • Iterate: • Where: • αis the random reset probability • L[j] is the number of links on page j 1 2 3 4 5 6

  8. Properties of Graph Parallel Algorithms Dependency Graph LocalUpdates Iterative Computation My Rank Friends Rank

  9. Addressing Graph-Parallel ML • We need alternatives to Map-Reduce Data-Parallel Graph-Parallel Map Reduce Pregel (Giraph)? Map Reduce? SVM Lasso Feature Extraction Cross Validation Belief Propagation Kernel Methods Computing Sufficient Statistics Tensor Factorization PageRank Neural Networks Deep Belief Networks

  10. Pregel (Giraph) • Bulk Synchronous Parallel Model: Compute Communicate Barrier

  11. Problem: Bulk synchronous computation can be highly inefficient

  12. BSP Systems Problem:Curse of the Slow Job Iterations Data Data Data Data CPU 1 CPU 1 CPU 1 Data Data Data Data Data Data Data Data CPU 2 CPU 2 CPU 2 Data Data Data Data Data Data Data Data CPU 3 CPU 3 CPU 3 Data Data Data Data Data Data Data Data Barrier Barrier Barrier

  13. Bulk synchronous computation model provably inefficient for some ML tasks

  14. BSP ML Problem: Data-Parallel Algorithms can be Inefficient Bulk Synchronous (Pregel) Asynchronous Splash BP Limitations of bulk synchronous model can lead to provably inefficient parallel algorithms But distributed Splash BP was built from scratch… efficient, parallel implementation was painful, painful, painful to achieve

  15. The Need for a New Abstraction • If not Pregel, then what? Data-Parallel Graph-Parallel Map Reduce Pregel (Giraph) Feature Extraction Cross Validation Belief Propagation Kernel Methods SVM Computing Sufficient Statistics Tensor Factorization PageRank Lasso Neural Networks Deep Belief Networks

  16. The GraphLabSolution • Designed specifically for ML needs • Express data dependencies • Iterative • Simplifies the design of parallel programs: • Abstract away hardware issues • Automatic data synchronization • Addresses multiple hardware architectures • Multicore • Distributed • Cloud computing • GPU implementation in progress

  17. What is GraphLab?

  18. The GraphLab Framework Scheduler Graph Based Data Representation Update Functions User Computation Consistency Model

  19. Data Graph A graph with arbitrary data (C++ Objects) associated with each vertex and edge. • Graph: • Social Network • Vertex Data: • User profile text • Current interests estimates • Edge Data: • Similarity weights

  20. Update Functions An update function is a user defined program which when applied to a vertex transforms the data in the scopeof the vertex pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope; // Update the vertex data // Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); } Dynamic computation

  21. The Scheduler The scheduler determines the order that vertices are updated b d a c CPU 1 c b e f g Scheduler e f b a i k h j i h i j CPU 2 The process repeats until the scheduler is empty

  22. The GraphLab Framework Scheduler Graph Based Data Representation Update Functions User Computation Consistency Model

  23. Ensuring Race-Free Code • How much can computation overlap?

  24. Need for Consistency?

  25. Inconsistent ALS Consistent Netflix data, 8 cores

  26. Even Simple PageRank can be Dangerous GraphLab_pagerank(scope) { refsum = scope.center_value sum= 0 forall(neighbor in scope.in_neighbors) sum = sum + neighbor.value/ nbr.num_out_edges sum = ALPHA + (1-ALPHA) * sum …

  27. Inconsistent PageRank

  28. Even Simple PageRank can be Dangerous CPU 1 GraphLab_pagerank(scope) { refsum = scope.center_value sum= 0 forall(neighbor in scope.in_neighbors) sum = sum + neighbor.value/ nbr.num_out_edges sum = ALPHA + (1-ALPHA) * sum … Read-write race  CPU 1 reads bad PageRank estimate, as CPU 2 computes value CPU 2 Read

  29. Race Condition Can Be Very Subtle GraphLab_pagerank(scope) { refsum = scope.center_value sum= 0 forall(neighbor in scope.in_neighbors) sum = sum + neighbor.value/ neighbor.num_out_edges sum = ALPHA + (1-ALPHA) * sum … Unstable GraphLab_pagerank(scope) { sum = 0 forall(neighbor in scope.in_neighbors) sum = sum + neighbor.value/ nbr.num_out_edges sum = ALPHA + (1-ALPHA) * sum scope.center_value= sum … Stable This was actually encountered in user code.

  30. GraphLab Ensures Sequential Consistency For each parallel execution, there exists a sequential execution of update functions which produces the same result. time CPU 1 Parallel CPU 2 Single CPU Sequential

  31. Consistency Rules Full Consistency Data Guaranteed sequential consistency for all update functions

  32. Full Consistency Full Consistency

  33. Obtaining More Parallelism Full Consistency Edge Consistency

  34. Edge Consistency Edge Consistency Safe Read CPU 1 CPU 2

  35. The GraphLab Framework Scheduler Graph Based Data Representation Update Functions User Computation Consistency Model

  36. Alternating Least Squares SVD Splash Sampler CoEM Bayesian Tensor Factorization Lasso Belief Propagation PageRank LDA SVM Gibbs Sampling Dynamic Block Gibbs Sampling K-Means Matrix Factorization …Many others… Linear Solvers

  37. GraphLab vs. Pregel (BSP) • Multicore PageRank (25M Vertices, 355M Edges) Pregel (via GraphLab) Pregel (via GraphLab) GraphLab GraphLab 51% updated only once

  38. CoEM (Rosie Jones, 2005) Named Entity Recognition Task Is “Dog” an animal? Is “Catalina” a place? Vertices: 2 Million Edges: 200 Million the dog <X> ran quickly Australia travelled to <X> Catalina Island <X> is pleasant

  39. CoEM (Rosie Jones, 2005) Optimal GraphLabCoEM Better 6x fewer CPUs! 15x Faster! 51

  40. GraphLab in the Cloud

  41. CoEM (Rosie Jones, 2005) Optimal Better Large 0.3% of Hadoop time Small

  42. Video Cosegmentation Segments mean the same Gaussian EM clustering + BP on 3D grid Model: 10.5 million nodes, 31 million edges

  43. Video Coseg. Speedups Ideal GraphLab

  44. Video Coseg. Speedups GraphLab Ideal

  45. Cost-Time Tradeoff video co-segmentation results a few machines helps a lot faster diminishingreturns more machines, higher cost

  46. Netflix Collaborative Filtering • Alternating Least Squares Matrix Factorization Model: 0.5 million nodes, 99 million edges MPI Hadoop GraphLab Users Netflix Ideal D D=100 D=20 Movies

  47. Multicore Abstraction Comparison • Netflix Matrix Factorization Dynamic Computation, Faster Convergence

  48. The Cost of Hadoop

  49. Fault Tolerance

  50. Fault-Tolerance • Larger Problems  Increased chance of Machine Failure • GraphLab2 Introduces two fault tolerance (checkpointing) mechanisms • Synchronous Snapshots • Chandi-Lamport Asynchronous Snapshots

More Related