Big Graph Search: Challenges and Techniques. Shuai Ma. Graphs are everywhere , and quite a few are huge graphs!. Application Scenarios. Software plagiarism detection [1]. Traditional plagiarism detection tools may not be applicable for serious software plagiarism problems.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Big Graph Search:
Challenges and Techniques
Shuai Ma
Graphs are everywhere, and quite a few are huge graphs!
Software plagiarism detection [1]
Recommender systems [3]
Transport routing [4]
Biological data analysis [5]
What is Graph Search?
A unified definition [6] (in the name of graph matching):
Remarks:
Different semantics of “match” implies different “types” of graph search, including, but not limited to, the following:
Graph search is a very ‘‘ general’’ concept!
Graph Search, Why Bother?
Social Networks
Facebook launched “graph search” on 16th January, 2013
Assault on Google, Yelp, and LinkedIn with new graph search;
Yelp was down more than 7%
World Wide Web
File systems
Databases
Graph search is a new paradigm for social computing!
Query：
Find the name of all of
Alberto Pepe'sfriends.
Step 1: The person.name index -> the identifier of Alberto Pepe. [O(log2n)]
Step 2: The friend.person index -> k friend identifiers. [O(log2x) : x<<m]
Step 3:The k friend identifiers -> k friend names. [O(k log2n)]
Query：
Find the name of all of
Alberto Pepe'sfriends.
Step 1: The vertex.name index -> the vertex with the name Alberto Pepe. [O(log2n)]
Step 2:The vertex returned -> the k friend names. [O(k + x)]
it’s interesting, and over the last 10 years, people have been trained on how to use search engines more effectively.
Keywords & Search In 2013: Interview With A. Goodman & M. Wagner
International Conference on Application of Natural Language to Information Systems (NLDB) started from 1995
Social computing
&
Web 2.0
DB people started working on graphs at around the same time！
Challenges
Facebook:
Graph search with high efficiency, striking a balance between its performance and accuracy.
Consider the dynamic changes and timing characteristics of data.
Solve the data quality problems.
Query Techniques for Big Graph Search
R= Q(G)
Key ideas：For a class Q of queries with a high computational complexity, find another class Q’ of queries that has a lower computational complexity with bounded quality loss for query answering.
approximation
Q’(D)
Q(D)
Challenge: balancing the expressive power and computational complexity!
approximation
Subgraph Isomorphism
(NP-Complete)
Strong Simulation
(O(n3))
Shuai Ma, Yang Cao, Wenfei Fan, JinpengHuai, and TianyuWo. Strong Simulation: Capturing Topology in Graph Pattern Matching. TODS 2014.
Shuai Ma, Yang Cao, Wenfei Fan, JinpengHuai, and TianyuWo. Capturing Topology in Graph Pattern Matching. VLDB 2012.
Keep exact structure topology between Q and Gs
Decision problemisNP-complete
May return exponential many matched subgraphs
In certain scenarios, too restrictive to find matches
These hinder the usability in emerging applications, e.g., social networks
Subgraph isomorphism (NP-complete) vs. graph simulation (O(n2))!
Set up a team to develop a new software product
Graph simulation returns F3, F4 and F5;
Subgraph isomorphism returns empty!
Subgraph isomorphism is too strict for emerging applications
“Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Osama Bin Laden, 2001)
Balance between complexity and the capability to capturing topology!
Disconnected
Tree
Long cycle
Strong simulation: bring duality and locality into graph simulation
Subgraph
Isomorphism
Strong
Simulation
Dual
Simulation
Graph
Simulation
Topology preservation and bounded matches
Data Techniques for Big Graph Search
R=Q(G)
Q(D1)
Q(Di)
Q(D)
Q(Dn)
It is NOT practical to handle large graphs on single machines
Distributed graph processing is inevitable
Model of Computation [3]:
Complexity measures:
34
Shuai Ma, Yang Cao, JinpengHuai, and TianyuWo. Distributed Graph Pattern Matching. WWW 2012.
incrementation
Q(D + Δ)
Q(D) + Q(Δ)
known
G. Ramalingam, Thomas W. Reps: A Categorized Bibliography on Incremental Computation. POPL 1993: 502-510
Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, Yinghui Wu, and Yunpeng Wu. Graph Pattern Matching: From Intractable to Polynomial Time. VLDB 2010
Google Percolator [20]:
It is a terrible waste to compute everything from scratch!
sampling
Q(D)
Q(Δ)
Michael I. Jordan: Divide-and-conquer and statistical inference for big data. KDD 2012: 4
Wenfei Fan, FlorisGeerts, Frank Neven: Making Queries Tractable on Big Data with Preprocessing. VLDB 2013
Weiren Yu, Charu Aggarwal, Shuai Ma, and Haixun Wang. On Anomalous Hotspot Discovery in Graph Streams. ICDM 2013
compression
Q(D)
Q(D’)
Wenfei Fan, Jianzhong Li, Xin Wang, Yinghui Wu: Query preserving graph compression. SIGMOD, 2012
partitioning
Q(D)
Q(D1) + … +Q(Dn)
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SISC, 20(1):359–392, 1998.
We have introduced graph search: a new paradigm for social computing
We have also briefly discussed the challenges of graph search
We have presented some useful techniques towards solving the problems
A long way to go for big graph search!
Homepage: http://mashuai.buaa.edu.cn
Email: [email protected]
Address: Room G1122,
New Main Building,
Beihang University
Beijing, China
[1] Chao Liu, Chen Chen, Jiawei Han and Philip S. Yu, GPLAG: detection of software plagiarism by program dependence graph analysis. KDD 2006.
[2] J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3):319–349, 1987.
[3] Shuai Ma, Yang Cao, JinpengHuai, and TianyuWo, Distributed Graph Pattern Matching, WWW 2012.
[4] Rice, M. and Tsotras, V.J., Graph indexing of road networks for shortest path queries with label restrictions, VLDB 2010.
[5] David A. Bader and KameshMadduri, A graph-theoretic analysis of the human protein-interaction network using multicore parallel algorithms. Parallel Computing 2008.
[6] Shuai Ma, Yang Cao, TianyuWo, and JinpengHuai, Social Networks and Graph Matching.Communications of CCF, 2012 (in Chinese).
[7] C. C. Aggarwal and H. Wang. Managing and Mining Graph Data. Springer, 2010.
[8] Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Yinghui Wu, Adding Regular Expressions to Graph Reachability and Pattern Queries. ICDE 2011.
[9] Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Yinghui Wu, Graph Pattern Matching: From Intractable to Polynomial Time. VLDB 2010.
[10] Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Yinghui Wu, Graph Homomorphism Revisited for Graph Matching. VLDB 2010.
[11] HosseinMaserrat and Jian Pei, Neighbor query friendly compression of social networks. KDD 2010.
[12] Brian Gallaghe, Matching structure and semantics: A survey on graph-based pattern matching. AAAI FS. 2006.
[13] Marko A. Rodriguez, Peter Neubauer: The Graph Traversal Pattern. Graph Data Management 2011: 29-46
[14] S.Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
[15] MehdiKargar, Aijun An: Keyword Search in Graphs: Finding r-cliques. In VLDB Conference, 2011.
[16] Shuai Ma, Yang Cao, Wenfei Fan, JinpengHuai, and TianyuWo, Capturing Topology in Graph Pattern Matching. VLDB 2012.
[17] Wenfei Fan, Graph Pattern Matching Revised for Social Network Analysis. ICDT 2012.
[18] Eytan Adar and Christopher Re, Managing Uncertainty in Social Networks, IEEE Data Eng. Bull., pp.15-22, 30(2), 2007.
[19] GueorgiKossinets, Effects of missing data in social networks. Social Networks 28:247-268, 2006.
[20] Daniel Peng, Frank Dabek: Large-scale Incremental Processing Using Distributed Transactions and Notifications. OSDI 2010.
[21] Monika Rauch Henzinger, Thomas A. Henzinger, Peter W. Kopke: Computing Simulations on Finite and Infinite Graphs. FOCS 1995:
Thanks!
Dr. Shuai Ma