Large-Scale Network Analysis with the Boost Graph Libraries

Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu

What are the BGLs? • A collection of libraries for computation on graphs/networks. • Graph data structures • Graph algorithms • Graph input/output • Common design • Flexibility/customizability throughout • Obsessed with performance • Common interfaces throughout the collection • All open source, freely available online Intro

The BGL Family • The Original (sequential) BGL • BGL-Python • The Parallel BGL • Parallel BGL-Python Intro

The Original BGL • The largest and most mature BGL • ~7 years of research and development • Many users, contributors outside of the OSL • Steadily evolving • Written in C++ • Generic • Highly customizable • Efficient (both storage and execution) Intro BGL

BGL: Graph Data Structures • Graphs: • adjacency_list: highly configurable with user-specified containers for vertices and edges • adjacency_matrix • compressed_sparse_row • Adaptors: • subgraphs, filtered graphs, reverse graphs • LEDA and Stanford GraphBase • Or, use your own… Intro BGL

Searches (breadth-first, depth-first, A*) Single-source shortest paths (Dijkstra, Bellman-Ford, DAG) All-pairs shortest paths (Johnson, Floyd-Warshall) Minimum spanning tree (Kruskal, Prim) Components (connected, strongly connected, biconnected) Maximum cardinality matching Max-flow (Edmonds-Karp, push-relabel) Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree) Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun) Betweenness centrality PageRank Isomorphism Vertex coloring Transitive closure Dominator tree Original BGL: Algorithms Intro BGL

Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL

Define a Graph Type • Determine vertex/edge properties:struct Vertex { string name; };struct Edge { int bicomponent; }; • Determine the graph type:typedef adjacency_list</*EdgeListS=*/ vecS, /*VertexListS=*/ vecS,/*DirectedS=*/ undirectedS,/*VertexProperty=*/ Vertex,/*EdgeProperty=*/ Edge> Graph; Intro BGL

Read in a GraphViz DOT File • Build an empty graph:Graph g; • Map vertex properties:dynamic_properties dyn;dyn.property(“node_id”, get(&Vertex::name, g)); • Read in the GraphViz graph:ifstream in(“biconnected_components.dot”);read_graphviz(in, g, dyn); Intro BGL

Run Biconnected Components • Keep track of the articulation points:vector<Graph::vertex_descriptor> art_points; • Compute biconnected components:biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points)); Intro BGL

Output results • Attach bicomponent number to the “label” property of edges:dyn.property(“label”, get(&Edge::bicomponent, g)); • Write results to another GraphViz file:ofstream out(“bc_out.dot”);write_graphviz(out, g, dyn); • Show articulation points:cout << “Articulation points: “;for (int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘;} Intro BGL

Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL

Original BGL Summary • The original BGL is large, stable, efficient • Lots of algorithms, graph types • Peer-reviewed code with many users, nightly regression testing, etc. • Performance comparable to FORTRAN. • Who should use the BGL? • Programmers comfortable with C++ • Users with graph sizes from tens of vertices to millions of vertices Intro BGL

BGL-Python • Python is ideal for rapid prototyping: • It’s a scripting language (no compiler) • Dynamically typed means less typing for you • Easy to use: you already know Python… • BGL-Python provides access to the BGL from within Python • Similar interfaces to C++ BGL • Easier to learn than C++ • Great for scripting, GUI applications • help(bgl.dijkstra_shortest_paths) Intro BGL Python

Example: Biconnected Components import boost.graph as bgl # Pull in the BGL bindings g = bgl.Graph.read_graphviz("biconnected_components.dot") # Compute biconnected components and articulation points bicomponent = g.edge_property_map(‘int’) art_points = bgl.biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g.edge_properties[‘label’] = bicomponentg.write_graphviz("biconnected_components_out.dot") print "Articulation points: ", node_id = g.vertex_properties[‘node_id’] for v in art_points: print node_id[v],’ ’, print "" Intro BGL Python

Wrapping the BGL in Python • BGL-Python is not a… • “port” • reimplementation • BGL-Python wraps the C++ BGL • Python calls translate to C++ calls • C++ can call back into Python • Most of the speed of C++ • Most of the flexibility of Python

Performance: Shortest Paths Intro BGL Python

BGL-Python Summary • BGL-Python is all about tradeoffs: • More gradual learning curve • Faster time-to-solution • Lower performance • Our typical approach: • Prototype in Python to get your ideas down • Port to C++ when performance matters Intro BGL Python

The Parallel BGL • A version of the C++ BGL for computational clusters • Distributed memory for huge graphs • Parallel processing for improved performance • An active research project • Closely related to the original BGL • Parallelizing BGL programs should be “easy” Intro BGL Python Parallel

A simple, directed graph… Parallel BGL: Distributed Graphs distributed across 3 processors. Intro BGL Python Parallel

Breadth-first search Eager Dijkstra’s single-source shortest paths Crauser et al. single-source shortest paths Depth-first search Minimum spanning tree (Boruvka, Dehne & Götz) Connected components Strongly connected components Biconnected components PageRank Graph coloring Fruchterman-Reingold layout Max-flow (Dinic’s) Parallel Graph Algorithms Intro BGL Python Parallel

Performance: Sparse graphs

Scalability (~547k vertices/node) Up to 70M Vertices 1B Edges Small-World Graph

Performance vs. CGMgraph 96k vertices 10M edges Erdos-Renyi 17x 30x Intro BGL Python Parallel

Parallel BGL Summary • The Parallel BGL is built for huge graphs • Millions to hundreds of millions of nodes • Distributed-memory parallel processing on clusters • Future work will permit larger graphs… • Parallel programming has a learning curve • Parallel graph algorithms much harder to write • Distributed graph manipulation can be tricky • Parallel BGL is an active research library Intro BGL Python Parallel

Distributed Graph Layout Intro BGL Python Parallel

Parallel BGL in Python • Preliminary support for the Parallel BGL in Python • Just import boost.graph.distributed • Similar interface to sequential BGL-Python • Several options for usage with MPI: • Straight MPI: mpirun -np 2 python script.py • pyMPI: allows interactive use of the interpreter • Initially used to prototype our distributed Fruchterman-Reingold implementation. Intro BGL Python Parallel

Porting for Performance Intro BGL Python Parallel Porting

Which BGL is Right for You? • Is any BGL right for you? • Depends on how large your networks are: • Up to 1/2 million vertices, any BGL will do • C++ BGL can push to a couple million vertices • For tens of millions or larger, Parallel BGL only • Other considerations: • You can prototype in Python, port to C++ • Algorithm authors might prefer the original BGL • Parallelism is very hard to manage Intro BGL Python Parallel Porting

Conclusion • The Boost Graph Library family is a collection of full-featured graph libraries • All are flexible, customizable, efficient • Easy to port from Python to C++ • Can port from sequential to parallel • Always growing, improving • Is one of the BGLs right for you? • A typical “build or buy” decision Intro BGL Python Parallel Porting Conclusion

For More Information… • (Original) Boost Graph Libraryhttp://www.boost.org/libs/graph/doc • Parallel Boost Graph Libraryhttp://www.osl.iu.edu/research/pbgl • Python Bindings for (Parallel) BGLhttp://www.osl.iu.edu/~dgregor/bgl-python • Contact us! • Douglas Gregor <dgregor@osl.iu.edu> • Andrew Lumsdaine <lums@osl.iu.edu> Intro BGL Python Parallel Porting Conclusion

Other BGL Variants • QuickGraph (C#)http://www.codeproject.com/cs/miscctrl/quickgraph.asp • Ruby Graph Libraryhttp://rubyforge.org/projects/rgl/ • Rooster Graph (Scheme)http://savannah.nongnu.org/projects/rgraph/ • RBGL (an R interface to the C++ BGL)http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.html • Disclaimer: These are all separate projects. We do not maintain them. Intro BGL Python Parallel Porting

Comparative Performance Intro BGL

Large-Scale Network Analysis with the Boost Graph Libraries

Large-Scale Network Analysis with the Boost Graph Libraries

Presentation Transcript

PIONIER Network Digital Libraries Federation Experiences of a large scale metadata aggregator

Complete Network Analysis Network Connections: Large-Scale network structure

Pregel : A System for Large-Scale Graph Processing

Large Scale Visualization with ParaView

Large-Scale Phylogenetic Analysis

Pregel : A System for Large-Scale Graph Processing

Large-Scale Social Network Analysis – The STACC Experience

Pregel : A System for Large-Scale Graph Processing

Pregel : A System for Large-Scale Graph Processing

A Large-Scale Network Testbed

Large Scale Radial Graph Drawing

The Large Scale Clinical Trial Network

Large – Scale Sensor network

Graph Laplacian Regularization for Large-Scale Semidefinite Programming

“Pajek”: Large Network Analysis

Introduction to Large-Scale Graph Computation

Service Deployment in Large Scale Network

Large-Scale Static Timing Analysis

Large Scale Network Growth Techniques

Parallel Subgraph Listing in a Large-Scale Graph

Large-Scale Graph Analytics

large scale data analysis