Simulation Revised for Graph Pattern Matching

1 / 28

# Simulation Revised for Graph Pattern Matching - PowerPoint PPT Presentation

Simulation Revised for Graph Pattern Matching. Outline. Graph Simulation label equality, edge-to-edge matching relation Bounded Simulation node predicates, edge bound, edge-to-path matching relation Reachability Queries and Graph Pattern Queries

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Simulation Revised for Graph Pattern Matching' - redell

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Simulation Revised for Graph Pattern Matching

Outline
• Graph Simulation
• label equality, edge-to-edge matching relation
• Bounded Simulation
• node predicates, edge bound, edge-to-path matching relation
• Reachability Queries and Graph Pattern Queries
• query containment and minimization – cubic time
• query evaluation – cubic time
• Conclusion

A first step towards revising simulation for graph pattern matching

Graph Pattern Matching: the problem
• Given a pattern graph P and a data graph G , decide whether Gmatches P, and if so, find all the matches of P in G.
• Applications
• social queries, social matching
• biology and chemistry network querying
• key work search, proximity search, …

How to define?

Widely employed in a variety of emerging real life applications

Graph Simulation
• Node label equivalence
• Edge-to-edge relation

A

A

B

B

v1

v2

B

Capable enough?

E

Identical label matching, edge-to-edge relations

D

D

E

P

G

An example from real life social matching

edge-to-path

mappings

biologist

3

3

1

Alice

doctors

1

P

G

Graph simulation is too restrictive!

Bounded Simulation
• data graph G = (V, E, fA)
• pattern graph P = (Vp, Ep, fv, fe)
• G matches P via bounded simulation if there is a binary relation from Vp to V that for every edge of P, there exists a path in G satisfying the constraints of the edge.
• bounded simulation v.s graph simulation
• node matches v.s label equality
• edge-to-path matching v.s edge-to-edge matching

Job = ‘biologist’

Job = ‘biologist’

3

Job = ‘biologist’

3

1

Job = ‘biologist’

special case

Id = ‘Alice’

Job = ‘doctors’

Job = ‘doctors’

1

Job = ‘CTO’

P

G

Id = ‘Alice’

Job = ‘doctors’

Enriched model for capturing meaningful matches

Basic results for the bounded simulation
• For any graph G and pattern P, if G matches P, then there is a unique maximum match in G for P.
• The graph pattern matching problem via bounded simulation can be solved in cubic time.
• The incremental bounded simulation problem

extension for multiple edge colors?

Efficient approaches for graph pattern matching

Considering edge types…

strangers-nemeses

strangers-allies

friends-allies

friends-nemeses

Essembly Network

Real life graphs have multiple edge types

Querying Essembly network: an example

sn

fa+

sa

fa<=2 sa<=2

Biologists supporting Cloning

fa

fn

fa<=2 sn

fn

Alice

Doctors

Against cloning

fn

P

Essembly Network

Pattern queries with multiple edge types

Graph reachability and pattern queries
• Real life graphs usually bear different edge types…
• data graph G = (V, E, fA, , fC)
• Reachability query (RQ) : (u1, u2, fu1, fu2, fe) where fe is a subclass of regular expression of:
• F ::= c | c≤k | c+ | FF
• Qr(G): set of node pairs (v1, v2) that there is a nonempty path from v1 to v2 , and the edge colors on the path match the pattern specified by fe.

Job=‘biologist’, sp=‘cloning’

fa<=2 fn

Job=‘doctors’

Graph pattern queries
• graph pattern queries PQ Qp =(Vp, Ep, fv, fe) where for each edge e=(u,u’), Qe=(u1, u2, fv(u), fv(u’), fe(e)) is an RQ.
• Qp(G) is the maximum set (e, Se)
• for any e1(u1,u2) and e2(u2 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v2,v3) is in Se2 .
• for any two edges e1(u1,u2) and e2(u1 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v1,v3) is in Se2
• PQ vs. simulation and bounded simulation
• search condition on query nodes
• mapping edges to paths
• constrain the edges on the path with a regular expression

RQ and bounded simulation are special cases of PQ

Reachability and graph pattern query: examples

sn

sa

fa

fn

Job=‘biologist’, sp=‘cloning’

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fa<=2 fn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

Job=‘doctors’

fn

Fundamental problems: query containment
• PQ Q1 (V1, E1, fv1, fe1) is contained in Q2 (V2, E2, fv2, fe2) if there exists a mapping λ from E1 to E2 s.t for any data graph G and e in E1, Se is a subset of Sλ(e) , i.e., λ is a renaming function that Q1(G) is mapped to Q2(G).
• Query containment and equivalence problems can all be determined in cubic time
• Query similarity based on a revision of graph simulation
• Determine the query similarity in cubic time

Query containment and equivalence for PQs can be solved efficiently

query containment: example

h<=3

h<=3

h<=1

h<=1

h<=1

h<=2

C2

C3

C4

C6

B1

B2

B3

C5

C1

Q1

Q3

Q2

Fundamental problems: query minimization
• Query minimization problem
• input: a PQ Qp
• output: a minimized PQ Qm equivalent to Qp
• Query minimization problem can be solved in cubic time.
• compute the maximum node equivalent classes based on a revision of graph simulation;
• determine the number of redundant nodes and edges based on the equivalent classes;
• Removed redundant and isolated nodes and edges

Query minimization for PQs can be solved efficiently

query minimization: example

g

g

g

f

f

f

R

R

R

B

B

B

g<=3

h<=2

g<=3

g<=3

g<=3

B

B

B

g<=3

h<=2

g<=3

h<=2

h<=2

h<=2

h<=2

C

C

C

C

C

C

C

C

Q1

Q2

Q3

Evaluating graph pattern queries
• PQ can be answered in cubic time.
• Join-based Algorithm JoinMatch
• Matrix index vs distance cache
• join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order)
• Split-based Algorithm SplitMatch
• blocks: treating pattern node and data node uniformly
• partition-relation pair

Graph pattern matching can be solved in polynomial time

Example of JoinMatch

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

Example of JoinMatch

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

Example of JoinMatch

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

Example of JoinMatch

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

Experimental results – effectiveness of PQs

Effectiveness of PQs: edge to path relations

Experimental results – querying real life graphs

Varying |Vp|

Varying |Ep|

Evaluation algorithms are sensitive to pattern edges

Experimental results – querying real life graphs

Varying |pred|

Varying b

The algorithms are sensitive to the number of predicates

Experimental results – querying synthetic graphs

Varying b

Varying |V| (x105)

The algorithms scale well over large synthetic graphs

Experimental results – querying synthetic graphs

Varying α

Varying cr

The algorithms scale well over large synthetic graphs

Conclusion
• Simulation revised for graph pattern matching
• Bounded Simulation
• node predicates, edge bound, edge-to-path matching relation
• Reachability Queries and Graph Pattern Queries
• query containment and minimization – cubic time
• query evaluation – cubic time
• Future work
• extending RQs and PQs by supporting general regular expressions
• incremental evaluation of RQs and PQs

Simulation revised for graph pattern matching

Thank you!

Terrorist Collaboration Network (1970 - 2010)

“Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden)