1 / 31

Pregel : A System for Large-Scale Graph Processing

Pregel : A System for Large-Scale Graph Processing. Presented by Dylan Davis Authors: Grzegorz Malewicz , Matthew H. Austern , Aart J.C. Bik, James C. Dehnert , Ilan Horn, Naty Leiser , Grzegorz Czajkowski (GOOGLE, INC.). Overview. What is a graph? Graph Problems

deanna
Download Presentation

Pregel : A System for Large-Scale Graph Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: GrzegorzMalewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, NatyLeiser, GrzegorzCzajkowski (GOOGLE, INC.)

  2. Overview • What is a graph? • Graph Problems • The Purpose of Pregel • Model of Computation • C++ API • Implementation • Applications • Experiments

  3. What is a graph? G = (V, E) Binary Tree

  4. Graph Problems Social Network Connections Network Routing

  5. The Purpose of Pregel • Google was interested in applications that could perform internet-related graph algorithms, such as PageRank, so they designed Pregel to perform these tasks efficiently. • It is a scalable, general-purpose system for implementing graph algorithms in a distributed environment. • Focus on “Thinking Like a Vertex” and parallelism

  6. Model of Computation

  7. Model of Computation (Vertex) Vertex ID Edge Value Vertex ID Edge Value Vertex ID Vertex Value

  8. Model of Computation (Superstep) Superstep 1 Superstep 2 Superstep 0 Compute() Compute() Compute() Compute() Compute() Compute() Compute() Compute() Compute() Execution Time

  9. Model of Computation (Vertex Actions) A vertex can: • Modify its values • Receive messages from previous superstep • Send messages • Request topology changes Vertex ID Vertex Value

  10. Model of Computation (State Machine)

  11. C++ API

  12. C++ API (Message Passing) Destination Vertex ID Message Value Message Buffer 2 1 2 57

  13. C++ API (Combiners & Aggregators) Combiner Aggregator

  14. C++ API (Topology Mutations) V Superstep

  15. C++ API (Input and Output) 0 1 2 3 4 0 0 0 1 1 0 1 0 0 0 1 1 2 1 1 0 1 1 3 0 1 1 0 1 4 1 1 1 0 0

  16. Implementation

  17. Implementation (Basic Architecture)

  18. Implementation (Program Execution) Flow: • Copy user program – Master copy & worker copies • Master assigns graph partitions • Master takes user input data, assigns to workers – load vertex data • Supersteps (Compute() and send messages) • Save output

  19. Implementation (Fault Tolerance) Recover Checkpoint Worker Save() Worker Recompute() Worker Save() X Worker Worker Recompute() Worker Save()

  20. Implementation (Worker) Worker Worker

  21. Implementation (Master) List of Workers Master Partitions

  22. Applications

  23. Applications (Shortest Path) 1 2 5 3

  24. Experiments

  25. Experiments (Description) • Test the execution times of Pregel running the Single-Source Shortest Path algorithm. • Use a cluster of 300 multicore commodity PCs. • Run Pregel with Binary Tree graphs, and with a more realistic, randomly-distributed graph. • Results do not include initialization, graph generation, and result verification times. • Failure Recovery is not included (reduces overhead)

  26. Conclusion • Pregel is a model suitable for large-scale graph computing with a production-quality, scalable and fault tolerant implementation. • Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges. • This implementation is flexible enough to express a broad set of algorithms.

More Related