1 / 34

Large-Scale Network Analysis with the Boost Graph Libraries

Large-Scale Network Analysis with the Boost Graph Libraries. Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu. What are the BGLs?. A collection of libraries for computation on graphs/networks. Graph data structures Graph algorithms Graph input/output Common design

taipa
Download Presentation

Large-Scale Network Analysis with the Boost Graph Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

  2. What are the BGLs? • A collection of libraries for computation on graphs/networks. • Graph data structures • Graph algorithms • Graph input/output • Common design • Flexibility/customizability throughout • Obsessed with performance • Common interfaces throughout the collection • All open source, freely available online Intro

  3. The BGL Family • The Original (sequential) BGL • BGL-Python • The Parallel BGL • Parallel BGL-Python Intro

  4. The Original BGL • The largest and most mature BGL • ~7 years of research and development • Many users, contributors outside of the OSL • Steadily evolving • Written in C++ • Generic • Highly customizable • Efficient (both storage and execution) Intro BGL

  5. BGL: Graph Data Structures • Graphs: • adjacency_list: highly configurable with user-specified containers for vertices and edges • adjacency_matrix • compressed_sparse_row • Adaptors: • subgraphs, filtered graphs, reverse graphs • LEDA and Stanford GraphBase • Or, use your own… Intro BGL

  6. Searches (breadth-first, depth-first, A*) Single-source shortest paths (Dijkstra, Bellman-Ford, DAG) All-pairs shortest paths (Johnson, Floyd-Warshall) Minimum spanning tree (Kruskal, Prim) Components (connected, strongly connected, biconnected) Maximum cardinality matching Max-flow (Edmonds-Karp, push-relabel) Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree) Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun) Betweenness centrality PageRank Isomorphism Vertex coloring Transitive closure Dominator tree Original BGL: Algorithms Intro BGL

  7. Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL

  8. Define a Graph Type • Determine vertex/edge properties:struct Vertex { string name; };struct Edge { int bicomponent; }; • Determine the graph type:typedef adjacency_list</*EdgeListS=*/ vecS, /*VertexListS=*/ vecS,/*DirectedS=*/ undirectedS,/*VertexProperty=*/ Vertex,/*EdgeProperty=*/ Edge> Graph; Intro BGL

  9. Read in a GraphViz DOT File • Build an empty graph:Graph g; • Map vertex properties:dynamic_properties dyn;dyn.property(“node_id”, get(&Vertex::name, g)); • Read in the GraphViz graph:ifstream in(“biconnected_components.dot”);read_graphviz(in, g, dyn); Intro BGL

  10. Run Biconnected Components • Keep track of the articulation points:vector<Graph::vertex_descriptor> art_points; • Compute biconnected components:biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points)); Intro BGL

  11. Output results • Attach bicomponent number to the “label” property of edges:dyn.property(“label”, get(&Edge::bicomponent, g)); • Write results to another GraphViz file:ofstream out(“bc_out.dot”);write_graphviz(out, g, dyn); • Show articulation points:cout << “Articulation points: “;for (int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘;} Intro BGL

  12. Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL

  13. Original BGL Summary • The original BGL is large, stable, efficient • Lots of algorithms, graph types • Peer-reviewed code with many users, nightly regression testing, etc. • Performance comparable to FORTRAN. • Who should use the BGL? • Programmers comfortable with C++ • Users with graph sizes from tens of vertices to millions of vertices Intro BGL

  14. BGL-Python • Python is ideal for rapid prototyping: • It’s a scripting language (no compiler) • Dynamically typed means less typing for you • Easy to use: you already know Python… • BGL-Python provides access to the BGL from within Python • Similar interfaces to C++ BGL • Easier to learn than C++ • Great for scripting, GUI applications • help(bgl.dijkstra_shortest_paths) Intro BGL Python

  15. Example: Biconnected Components import boost.graph as bgl # Pull in the BGL bindings g = bgl.Graph.read_graphviz("biconnected_components.dot") # Compute biconnected components and articulation points bicomponent = g.edge_property_map(‘int’) art_points = bgl.biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g.edge_properties[‘label’] = bicomponentg.write_graphviz("biconnected_components_out.dot") print "Articulation points: ", node_id = g.vertex_properties[‘node_id’] for v in art_points: print node_id[v],’ ’, print "" Intro BGL Python

  16. Wrapping the BGL in Python • BGL-Python is not a… • “port” • reimplementation • BGL-Python wraps the C++ BGL • Python calls translate to C++ calls • C++ can call back into Python • Most of the speed of C++ • Most of the flexibility of Python

  17. Performance: Shortest Paths Intro BGL Python

  18. BGL-Python Summary • BGL-Python is all about tradeoffs: • More gradual learning curve • Faster time-to-solution • Lower performance • Our typical approach: • Prototype in Python to get your ideas down • Port to C++ when performance matters Intro BGL Python

  19. The Parallel BGL • A version of the C++ BGL for computational clusters • Distributed memory for huge graphs • Parallel processing for improved performance • An active research project • Closely related to the original BGL • Parallelizing BGL programs should be “easy” Intro BGL Python Parallel

  20. A simple, directed graph… Parallel BGL: Distributed Graphs distributed across 3 processors. Intro BGL Python Parallel

  21. Breadth-first search Eager Dijkstra’s single-source shortest paths Crauser et al. single-source shortest paths Depth-first search Minimum spanning tree (Boruvka, Dehne & Götz) Connected components Strongly connected components Biconnected components PageRank Graph coloring Fruchterman-Reingold layout Max-flow (Dinic’s) Parallel Graph Algorithms Intro BGL Python Parallel

  22. Performance: Sparse graphs

  23. Scalability (~547k vertices/node) Up to 70M Vertices 1B Edges Small-World Graph

  24. Performance vs. CGMgraph 96k vertices 10M edges Erdos-Renyi 17x 30x Intro BGL Python Parallel

  25. Parallel BGL Summary • The Parallel BGL is built for huge graphs • Millions to hundreds of millions of nodes • Distributed-memory parallel processing on clusters • Future work will permit larger graphs… • Parallel programming has a learning curve • Parallel graph algorithms much harder to write • Distributed graph manipulation can be tricky • Parallel BGL is an active research library Intro BGL Python Parallel

  26. Distributed Graph Layout Intro BGL Python Parallel

  27. Parallel BGL in Python • Preliminary support for the Parallel BGL in Python • Just import boost.graph.distributed • Similar interface to sequential BGL-Python • Several options for usage with MPI: • Straight MPI: mpirun -np 2 python script.py • pyMPI: allows interactive use of the interpreter • Initially used to prototype our distributed Fruchterman-Reingold implementation. Intro BGL Python Parallel

  28. Porting for Performance Intro BGL Python Parallel Porting

  29. Which BGL is Right for You? • Is any BGL right for you? • Depends on how large your networks are: • Up to 1/2 million vertices, any BGL will do • C++ BGL can push to a couple million vertices • For tens of millions or larger, Parallel BGL only • Other considerations: • You can prototype in Python, port to C++ • Algorithm authors might prefer the original BGL • Parallelism is very hard to manage Intro BGL Python Parallel Porting

  30. Conclusion • The Boost Graph Library family is a collection of full-featured graph libraries • All are flexible, customizable, efficient • Easy to port from Python to C++ • Can port from sequential to parallel • Always growing, improving • Is one of the BGLs right for you? • A typical “build or buy” decision Intro BGL Python Parallel Porting Conclusion

  31. For More Information… • (Original) Boost Graph Libraryhttp://www.boost.org/libs/graph/doc • Parallel Boost Graph Libraryhttp://www.osl.iu.edu/research/pbgl • Python Bindings for (Parallel) BGLhttp://www.osl.iu.edu/~dgregor/bgl-python • Contact us! • Douglas Gregor <dgregor@osl.iu.edu> • Andrew Lumsdaine <lums@osl.iu.edu> Intro BGL Python Parallel Porting Conclusion

  32. Other BGL Variants • QuickGraph (C#)http://www.codeproject.com/cs/miscctrl/quickgraph.asp • Ruby Graph Libraryhttp://rubyforge.org/projects/rgl/ • Rooster Graph (Scheme)http://savannah.nongnu.org/projects/rgraph/ • RBGL (an R interface to the C++ BGL)http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.html • Disclaimer: These are all separate projects. We do not maintain them. Intro BGL Python Parallel Porting

  33. Comparative Performance Intro BGL

More Related