- 78 Views
- Uploaded on
- Presentation posted in: General

Simulation Revised for Graph Pattern Matching

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Simulation Revised for Graph Pattern Matching

- Graph Simulation
- label equality, edge-to-edge matching relation

- Bounded Simulation
- node predicates, edge bound, edge-to-path matching relation

- Reachability Queries and Graph Pattern Queries
- query containment and minimization – cubic time
- query evaluation – cubic time

- Conclusion

A first step towards revising simulation for graph pattern matching

- Given a pattern graph P and a data graph G , decide whether Gmatches P, and if so, find all the matches of P in G.
- Applications
- social queries, social matching
- biology and chemistry network querying
- key work search, proximity search, …

How to define?

Widely employed in a variety of emerging real life applications

- Node label equivalence
- Edge-to-edge relation

A

A

B

B

v1

v2

B

Capable enough?

E

Identical label matching, edge-to-edge relations

D

D

E

P

G

edge-to-path

mappings

biologist

3

3

1

Alice

doctors

1

P

G

Graph simulation is too restrictive!

- data graph G = (V, E, fA)
- pattern graph P = (Vp, Ep, fv, fe)
- G matches P via bounded simulation if there is a binary relation from Vp to V that for every edge of P, there exists a path in G satisfying the constraints of the edge.

- node matches v.s label equality
- edge-to-path matching v.s edge-to-edge matching

Job = ‘biologist’

Job = ‘biologist’

3

Job = ‘biologist’

3

1

Job = ‘biologist’

special case

Id = ‘Alice’

Job = ‘doctors’

Job = ‘doctors’

1

Job = ‘CTO’

P

G

Id = ‘Alice’

Job = ‘doctors’

Enriched model for capturing meaningful matches

- For any graph G and pattern P, if G matches P, then there is a unique maximum match in G for P.
- The graph pattern matching problem via bounded simulation can be solved in cubic time.
- The incremental bounded simulation problem

extension for multiple edge colors?

Efficient approaches for graph pattern matching

strangers-nemeses

strangers-allies

friends-allies

friends-nemeses

Essembly Network

Real life graphs have multiple edge types

sn

fa+

sa

fa<=2 sa<=2

Biologists supporting Cloning

fa

fn

fa<=2 sn

fn

Alice

Doctors

Against cloning

fn

P

Essembly Network

Pattern queries with multiple edge types

- Real life graphs usually bear different edge types…
- data graph G = (V, E, fA, , fC)

- Reachability query (RQ) : (u1, u2, fu1, fu2, fe) where fe is a subclass of regular expression of:
- F ::= c | c≤k | c+ | FF

Job=‘biologist’, sp=‘cloning’

fa<=2 fn

Job=‘doctors’

- graph pattern queries PQ Qp =(Vp, Ep, fv, fe) where for each edge e=(u,u’), Qe=(u1, u2, fv(u), fv(u’), fe(e)) is an RQ.
- Qp(G) is the maximum set (e, Se)
- for any e1(u1,u2) and e2(u2 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v2,v3) is in Se2 .
- for any two edges e1(u1,u2) and e2(u1 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v1,v3) is in Se2

- PQ vs. simulation and bounded simulation
- search condition on query nodes
- mapping edges to paths
- constrain the edges on the path with a regular expression

RQ and bounded simulation are special cases of PQ

sn

sa

fa

fn

Job=‘biologist’, sp=‘cloning’

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fa<=2 fn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

Job=‘doctors’

fn

- PQ Q1 (V1, E1, fv1, fe1) is contained in Q2 (V2, E2, fv2, fe2) if there exists a mapping λ from E1 to E2 s.t for any data graph G and e in E1, Se is a subset of Sλ(e) , i.e., λ is a renaming function that Q1(G) is mapped to Q2(G).
- Query containment and equivalence problems can all be determined in cubic time
- Query similarity based on a revision of graph simulation
- Determine the query similarity in cubic time

Query containment and equivalence for PQs can be solved efficiently

h<=3

h<=3

h<=1

h<=1

h<=1

h<=2

C2

C3

C4

C6

B1

B2

B3

C5

C1

Q1

Q3

Q2

- Query minimization problem
- input: a PQ Qp
- output: a minimized PQ Qm equivalent to Qp

- Query minimization problem can be solved in cubic time.
- compute the maximum node equivalent classes based on a revision of graph simulation;
- determine the number of redundant nodes and edges based on the equivalent classes;
- Removed redundant and isolated nodes and edges

Query minimization for PQs can be solved efficiently

g

g

g

f

f

f

R

R

R

B

B

B

g<=3

h<=2

g<=3

g<=3

g<=3

B

B

B

g<=3

h<=2

g<=3

h<=2

h<=2

h<=2

h<=2

C

C

C

C

C

C

C

C

Q1

Q2

Q3

- PQ can be answered in cubic time.
- Join-based Algorithm JoinMatch
- Matrix index vs distance cache
- join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order)

- Split-based Algorithm SplitMatch
- blocks: treating pattern node and data node uniformly
- partition-relation pair

- Join-based Algorithm JoinMatch

Graph pattern matching can be solved in polynomial time

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

Effectiveness of PQs: edge to path relations

Varying |Vp|

Varying |Ep|

Evaluation algorithms are sensitive to pattern edges

Varying |pred|

Varying b

The algorithms are sensitive to the number of predicates

Varying b

Varying |V| (x105)

The algorithms scale well over large synthetic graphs

Varying α

Varying cr

The algorithms scale well over large synthetic graphs

- Simulation revised for graph pattern matching
- Bounded Simulation
- node predicates, edge bound, edge-to-path matching relation

- Reachability Queries and Graph Pattern Queries
- query containment and minimization – cubic time
- query evaluation – cubic time

- Bounded Simulation
- Future work
- extending RQs and PQs by supporting general regular expressions
- incremental evaluation of RQs and PQs

Simulation revised for graph pattern matching

Thank you!

Terrorist Collaboration Network (1970 - 2010)

“Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden)