Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs

Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs Hakan Kardeş CS 791v

Introduction • Many systems are being modeled as complex networks to understand local and global characteristics of these systems. • Studying network models of these systems provides a new direction towards understanding biological, chemical, technological or social systems in a better way. CS 791v

Complex Networks Everywhere Aspirin Yeast protein interaction network An Internet Web Co-author network CS 791v

Why Graph Mining and Searching? • In many cases, systems under investigation are very large and the corresponding graphs have large number of nodes/edges requiring graph mining techniques to derive information from the graph. • Several graph mining techniques have been developed to extract useful information from graph representation and analyze various features of complex networks. CS 791v

Why is Triangle Counting important? A C B [WF94)] • Hidden Thematic Structure of the Web (Eckmann et al. PNAS [EM02]) • Motif Detection, (e.g., [YPSB05] ) • Web Spam Detection (Becchetti et.al. KDD ’08 [BBCG08]) Clustering coefficient Transitivity ratio Social Network Analysis fact: “Friends of friends are friends” CS 791v

Related Work • HakanKardes, and M. H. Gunes. Structural Graph Indexing for Mining Complex Networks. IEEE ICDCS 2010 Workshop on Simplifying Complex Networks for Practitioners, Genoa, ITALY, June 21 2010. • Our paper in which we count all star, triangle, complete bipartite and clique structures. • MatthieuLatapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407, 1-3 (November 2008), 458-473. • Survey paper, focused on space complexity • CharalamposTsourakakis, PetrosDrineas, EirinaiosMichelakis, IoannisKoutis, Christos Faloutsos, "Spectral Counting of Triangles in Power-Law Networks via Element-Wise Sparsification," Social Network Analysis and Mining, International Conference on Advances in, pp. 66-71, 2009 International Conference on Advances in Social Network Analysis and Mining, 2009 • relies on the spectral properties of power-law networks, focused on power-law networks CS 791v

Related Work • Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient algorithms for large-scale local triangle counting.ACM Transactions on Knowl. Discov. Data 4, 3, Article 13 (October 2010), 28 pages. • They count the number of triangles for a given node. • Charalampos E. Tsourakakis, U Kang, Gary L. Miller, and Christos Faloutsos. Doulion: Counting triangles in massive graphs with a coin.InKnowledge Discovery and Data Mining (KDD '09) • Belkacem Serrour, Alex Arenas, Sergio Gomez. 2010. Detecting communities of triangles in complex networks using spectral optimization. • Bill Andreopoulos, Christof Winter, Dirk Labudde and Michael Schroeder. Triangle network motifs predict complexes by complementing high-error interactomes with structural information. BMC Bioinformatics 2009 CS 791v

Methodology

Star • We first index the star structure where a node has multiple neighbors as shown in below figures. • All star structures within a graph G = (V,E) are represented as s(vi , nsi) where vi ∈ V and nsi is the set of all neighbors of vi. • We index maximal star structures for each node. ns1 ns1 ns1 v1 v1 ns2 ns2 vi ns2 ns3 . . . . nsn CS 791v

Star Algorithm: • First build a star structure s(v,ø) for each node v ∈ V, without any neighbors. • Then, for each edge e(a, b) ∈ E, append neighbor sets of nodes a and b to the other one. • Finally, remove star structures s(v,ns) that have less than two neighbors. Nodes: a, b, c, d, e, f. Edges: (a,b), (a,d), (a,f), (b,e), (c,f), (d,f) Star Structures: a f b c b a f e d a b a d f e d c f CS 791v

Triangle Algorithm: • Find second hop neighbors of ‘a’ by iterating over the ns set • Then, take the intersection of second hop neighbors of ‘a’ and ns set. • Grow the triangle set for each isi ԑ is. ns1 ns2 a . . . . ls1 nsn ls2 . . . . lsn CS 791v

CUDA • For the parallel algorithm, I will use CUDA. CS 791v

CUDA CS 791v

Experiments

Possible Datasets for Experiments • Router-level Internet topology (around 2.3 M nodes and 4M edges) • http://cheleby.cse.unr.edu/data.html • the routing data on the Internet network (around 124K nodes and 207K edges) • http://vlado.fmf.uni-lj.si/pub/networks/data/web/web.zip • a mobile phone graph. (around 2.7M nodes and 6M edges) • Will be requested from the authors of “Structure of neighborhoods in a large social network” • Biological Data • http://www.biomedcentral.com/1471-2105/10/196 • Wikipedia graph (around 1.6M nodes and 18.5M edges) • I haven’t decided how to do it yet. • I will generate sample graphs with different number of triangles • I haven’t decided how to do it yet. CS 791v

Results • Triangle Counting CPU vs. GPU: Execution Time no. of nodes CS 791v

Results • Triangle Counting CPU vs. GPU: Execution Time no. of edges(while no. of nodes is constant CS 791v

Results • Triangle Counting with different triangle sizes: Execution Time No. of triangles CS 791v

Results • Triangle Counting with different block sizes: Execution Time Block Size CS 791v

Future Work

Structural Graph Indexing(SGI) • We propose an alternative structural indexing approach to search and process queries efficiently even in very large graphs. • As indexing features, we use commonly observed graph structures: star, complete bipartite, triangle and clique. • These structures are ubiquitous in biological, chemical, technological, social, and many other complex networks. CS 791v

Structural Models d1 d1 d1 v1 v2 d1 v1 v2 v1 v1 d1 v1 d2 . . . . . . d2 . . . v2 d2 . . . v1 v1 v3 d3 vn d2 v1 . . . . v3 2*3-Complete Bipartite(K2,3) 3*4-Complete Bipartite(K3,4) 3-Star (K1,3) n-Star (K1,n) . v2 vm . . d4 . dn d3 d3 . dn v4 v4 v3 v3 v2 m*n-Complete Bipartite(Km,n) Triangle(K3) 4-Clique (K4) n-Clique(Kn) CS 791v

Structural Graph Indexing • An important feature of these structures is that each one is comprised from the previous one where clique contains complete bipartite structures and complete bipartite contains star structures. • So, we can index these structures within the original graph in a consecutive manner. We first identify star structures, and then the complete-bipartite and clique structures from the preceding ones. CS 791v

Structural Graph Indexing • An important difference of our approach from the previous studies is that we does not limit the size of subgraph considered in indexing. We index all maximal graphs that match the structure formulation. For instance, a maximal clique is a clique that cannot be extended by adding one more vertex from the graph. However, the substructure size in indexing may be limited when needed, since maximal clique search is known to be NP-complete. CS 791v

Complete Bipartite d1 d1 d1 • The second structure we index is complete bipartite, shown in below figures. • A complete bipartite graph G = (V1 ∪ V2,E) is a bipartite graph such that V1 and V2 are two distinct sets and for any two vertices vi ∈ V1 and vj ∈ V2, then there is an edge between them (i.e., ∃ e∗ (vi,vj ) ∈ E). v1 v1 v1 . . . . . . . d2 . . v2 d2 . . . . d3 v3 v2 vm d4 d3 dn SIMPLEX’10 CS 791v

Complete Bipartite • Complete bipartite structure is ubiquitous in many complex networks. • protein-protein interaction networks (Thomas et. al.) • the Internet (Fay et. al.) • We index all complete bipartite structures in the graph G using indexed star structures. • For each star structure s(a,ns) where a ∈ V and ns is the neighbor set of the node a, we identify the maximal complete bipartite involving the node ‘a’. CS 791v

Complete Bipartite Algorithm: • Find second hop neighbors of ‘a’ by iterating over the ns set and unifying them under Lcan set that indicates candidates for the left side of the complete bipartite while the ns set is the candidate set for the right hand side. • Then, find a K2,n and then grow it to Km,n. In finding K2,n , iterate over each candidate node in the Lcan and determine the neighbor intersection with a. If the intersection set is larger than two, then these nodes belong to the right hand side. • Grow the K2,n by finding all nodes in the left hand side (i.e., Lcan) that has the right hand side nodes (i.e., Rnew) as a neighbor. ns1 Right can. set ns2 a . . . . ls1 Left can. set nsn ls2 . . . . lsn CS 791v

Clique • Finally, we index clique structures shown in below figures. • A clique in graph G = (V,E) is a subset of the vertex set (i.e., C ⊆ V ) such that there are edges between all node pairs (i.e., ∀(ci, cj) ∈ C, ∃e(ci,cj) ∈ E, when i ≠ j). • We index all maximal clique structures in the graph using complete bipartite structures. v1 v2 v1 v1 v2 v3 vn . v3 v2 v3 v4 . . v4 . . . CS 791v

Clique • This structure has been observed and utilized in many fields. • computational biology • protein structure prediction (Samudrala et. al.) • electronic circuits (Cong et. al.) • chemicals in a chemical database (Rhodes et. al.) CS 791v

Clique d1 Algorithm: • First get the set of nodes from each complete bipartite k(m,n) and look for cliques that are formed by those nodes. • The clique search algorithm works recursively on each node from the k(m,n) as the pivot node in the L1 set and considers other nodes as candidate nodes in the L2 set. • The function, moves each node from the L2 set to the L1 set if it is connected to all nodes in the L1 and then recursively tries to grow the structure with remaining nodes as candidates. • When there are no more candidates to consider in L2 set then a clique has been identified. Set1 v1 d2 • v1 v2 d3 v3 Set2 d4 • d1 • d2 • d3 • d4 • v2 • v3 CS 791v

Where to Submit • Advances in Social Network Analysis and Mining (ASONAM 2011): • Full paper submission deadline is March 1, 2011. • Full paper manuscripts must be with a maximum length of 8 pages (using the IEEE two- column template). • Kaohsiung, Taiwan7/25-7/27 • Workshop on Large-scale Data Mining: Theory and Applications (LDMTA 2011) • Workshop on Mining and Learning with Graphs (MLG 2011) • Workshop on Social Network Mining and Analysis (SNAKDD 2011) • Full paper submission deadline is May 4-10, 2011. • Full paper manuscripts must be with a maximum length of 10 pages (using the ACM two- column template). • San Diego, CA 8/21-8/24 • Simplifying Network Science for Practitioners: (SIMPLEX 2011) • Full paper submission deadline is Jan 31, 2011 – Feb 19 2011. • Full paper manuscripts must be with a maximum length of 6-10 pages (using the IEEE two- column template). • Minneapolis, Minnesota, USA 6/20-6/24 CS 791v

Questions SIMPLEX’10

Thank you

Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs

Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs

Presentation Transcript

Large-Scale MIMO in Cellular Networks

Fast counting of triangles in large networks without counting: Algorithms and laws

Efficient Large-Scale Stereo Matching ACCV 2010

Efficient Large-Scale Structured Learning

Efficient On-Demand Operations in Large-Scale Infrastructures

Community Structure in Large Complex Networks

Large-Scale Simulation of Complex Flows

Towards Efficient Simulation of Large Scale P2P Networks

LEAP: Efficient Security Mechanisms for Large-Scale Distributed Sensor Networks

Local Computations in Large-Scale Networks

LAEP: Efficient Security Mechanisms for Large-Scale Distributed Sensor Networks

Scale-free and Hierarchical Structures in Complex Networks

Efficient Implementation of Complex Interventions in Large Scale Epidemic Simulations

Social Influence Analysis in Large-scale Networks

Efficient Algorithms for Large-Scale Topology Discovery

Efficient Simulation of Large-Scale P2P Networks: Compact Data Structures

Large-Scale Simulation of Complex Flows

Large Scale IP Networks

Energy Efficient Routing Strategies for Large Scale Wireless Sensor in Heterogeneous Networks