- 105 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Simulation Revised for Graph Pattern Matching' - redell

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Example of JoinMatch

Outline

- Graph Simulation
- label equality, edge-to-edge matching relation
- Bounded Simulation
- node predicates, edge bound, edge-to-path matching relation
- Reachability Queries and Graph Pattern Queries
- query containment and minimization – cubic time
- query evaluation – cubic time
- Conclusion

A first step towards revising simulation for graph pattern matching

Graph Pattern Matching: the problem

- Given a pattern graph P and a data graph G , decide whether Gmatches P, and if so, find all the matches of P in G.
- Applications
- social queries, social matching
- biology and chemistry network querying
- key work search, proximity search, …

How to define?

Widely employed in a variety of emerging real life applications

Graph Simulation

- Node label equivalence
- Edge-to-edge relation

A

A

B

B

v1

v2

B

Capable enough?

E

Identical label matching, edge-to-edge relations

D

D

E

P

G

An example from real life social matching

edge-to-path

mappings

biologist

3

3

1

Alice

doctors

1

P

G

Graph simulation is too restrictive!

Bounded Simulation

- data graph G = (V, E, fA)
- pattern graph P = (Vp, Ep, fv, fe)
- G matches P via bounded simulation if there is a binary relation from Vp to V that for every edge of P, there exists a path in G satisfying the constraints of the edge.
- bounded simulation v.s graph simulation
- node matches v.s label equality
- edge-to-path matching v.s edge-to-edge matching

Job = ‘biologist’

Job = ‘biologist’

3

Job = ‘biologist’

3

1

Job = ‘biologist’

special case

Id = ‘Alice’

Job = ‘doctors’

Job = ‘doctors’

1

Job = ‘CTO’

P

G

Id = ‘Alice’

Job = ‘doctors’

Enriched model for capturing meaningful matches

Basic results for the bounded simulation

- For any graph G and pattern P, if G matches P, then there is a unique maximum match in G for P.
- The graph pattern matching problem via bounded simulation can be solved in cubic time.
- The incremental bounded simulation problem

extension for multiple edge colors?

Efficient approaches for graph pattern matching

Considering edge types…

strangers-nemeses

strangers-allies

friends-allies

friends-nemeses

Essembly Network

Real life graphs have multiple edge types

Querying Essembly network: an example

sn

fa+

sa

fa<=2 sa<=2

Biologists supporting Cloning

fa

fn

fa<=2 sn

fn

Alice

Doctors

Against cloning

fn

P

Essembly Network

Pattern queries with multiple edge types

Graph reachability and pattern queries

- Real life graphs usually bear different edge types…
- data graph G = (V, E, fA, , fC)
- Reachability query (RQ) : (u1, u2, fu1, fu2, fe) where fe is a subclass of regular expression of:
- F ::= c | c≤k | c+ | FF
- Qr(G): set of node pairs (v1, v2) that there is a nonempty path from v1 to v2 , and the edge colors on the path match the pattern specified by fe.

Job=‘biologist’, sp=‘cloning’

fa<=2 fn

Job=‘doctors’

Graph pattern queries

- graph pattern queries PQ Qp =(Vp, Ep, fv, fe) where for each edge e=(u,u’), Qe=(u1, u2, fv(u), fv(u’), fe(e)) is an RQ.
- Qp(G) is the maximum set (e, Se)
- for any e1(u1,u2) and e2(u2 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v2,v3) is in Se2 .
- for any two edges e1(u1,u2) and e2(u1 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v1,v3) is in Se2
- PQ vs. simulation and bounded simulation
- search condition on query nodes
- mapping edges to paths
- constrain the edges on the path with a regular expression

RQ and bounded simulation are special cases of PQ

Reachability and graph pattern query: examples

sn

sa

fa

fn

Job=‘biologist’, sp=‘cloning’

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fa<=2 fn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

Job=‘doctors’

fn

Fundamental problems: query containment

- PQ Q1 (V1, E1, fv1, fe1) is contained in Q2 (V2, E2, fv2, fe2) if there exists a mapping λ from E1 to E2 s.t for any data graph G and e in E1, Se is a subset of Sλ(e) , i.e., λ is a renaming function that Q1(G) is mapped to Q2(G).
- Query containment and equivalence problems can all be determined in cubic time
- Query similarity based on a revision of graph simulation
- Determine the query similarity in cubic time

Query containment and equivalence for PQs can be solved efficiently

Fundamental problems: query minimization

- Query minimization problem
- input: a PQ Qp
- output: a minimized PQ Qm equivalent to Qp
- Query minimization problem can be solved in cubic time.
- compute the maximum node equivalent classes based on a revision of graph simulation;
- determine the number of redundant nodes and edges based on the equivalent classes;
- Removed redundant and isolated nodes and edges

Query minimization for PQs can be solved efficiently

query minimization: example

g

g

g

f

f

f

R

R

R

B

B

B

g<=3

h<=2

g<=3

g<=3

g<=3

B

B

B

g<=3

h<=2

g<=3

h<=2

h<=2

h<=2

h<=2

C

C

C

C

C

C

C

C

Q1

Q2

Q3

Evaluating graph pattern queries

- PQ can be answered in cubic time.
- Join-based Algorithm JoinMatch
- Matrix index vs distance cache
- join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order)
- Split-based Algorithm SplitMatch
- blocks: treating pattern node and data node uniformly
- partition-relation pair

Graph pattern matching can be solved in polynomial time

Example of JoinMatch

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

Example of JoinMatch

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

Example of JoinMatch

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

sn

sa

fa

fn

fa+

fa<=2 sa<=2

Job=‘biologist’, sp=‘cloning’

fa<=2 sn

fn

Id=‘Alice’

Job=‘doctors’

dsp=‘cloning’

fn

Experimental results – effectiveness of PQs

Effectiveness of PQs: edge to path relations

Experimental results – querying real life graphs

Varying |Vp|

Varying |Ep|

Evaluation algorithms are sensitive to pattern edges

Experimental results – querying real life graphs

Varying |pred|

Varying b

The algorithms are sensitive to the number of predicates

Experimental results – querying synthetic graphs

Varying b

Varying |V| (x105)

The algorithms scale well over large synthetic graphs

Experimental results – querying synthetic graphs

Varying α

Varying cr

The algorithms scale well over large synthetic graphs

Conclusion

- Simulation revised for graph pattern matching
- Bounded Simulation
- node predicates, edge bound, edge-to-path matching relation
- Reachability Queries and Graph Pattern Queries
- query containment and minimization – cubic time
- query evaluation – cubic time
- Future work
- extending RQs and PQs by supporting general regular expressions
- incremental evaluation of RQs and PQs

Simulation revised for graph pattern matching

Terrorist Collaboration Network (1970 - 2010)

“Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden)

Download Presentation

Connecting to Server..