Simulation revised for graph pattern matching
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Simulation Revised for Graph Pattern Matching PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on
  • Presentation posted in: General

Simulation Revised for Graph Pattern Matching. Outline. Graph Simulation label equality, edge-to-edge matching relation Bounded Simulation node predicates, edge bound, edge-to-path matching relation Reachability Queries and Graph Pattern Queries

Download Presentation

Simulation Revised for Graph Pattern Matching

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Simulation revised for graph pattern matching

Simulation Revised for Graph Pattern Matching


Outline

Outline

  • Graph Simulation

    • label equality, edge-to-edge matching relation

  • Bounded Simulation

    • node predicates, edge bound, edge-to-path matching relation

  • Reachability Queries and Graph Pattern Queries

    • query containment and minimization – cubic time

    • query evaluation – cubic time

  • Conclusion

A first step towards revising simulation for graph pattern matching


Graph pattern matching the problem

Graph Pattern Matching: the problem

  • Given a pattern graph P and a data graph G , decide whether Gmatches P, and if so, find all the matches of P in G.

  • Applications

    • social queries, social matching

    • biology and chemistry network querying

    • key work search, proximity search, …

How to define?

Widely employed in a variety of emerging real life applications


Graph simulation

Graph Simulation

  • Node label equivalence

  • Edge-to-edge relation

A

A

B

B

v1

v2

B

Capable enough?

E

Identical label matching, edge-to-edge relations

D

D

E

P

G


An example from real life social matching

An example from real life social matching

edge-to-path

mappings

biologist

3

3

1

Alice

doctors

1

P

G

Graph simulation is too restrictive!


Bounded simulation

Bounded Simulation

  • data graph G = (V, E, fA)

  • pattern graph P = (Vp, Ep, fv, fe)

  • G matches P via bounded simulation if there is a binary relation from Vp to V that for every edge of P, there exists a path in G satisfying the constraints of the edge.

  • bounded simulation v.s graph simulation

    • node matches v.s label equality

    • edge-to-path matching v.s edge-to-edge matching

  • Job = ‘biologist’

    Job = ‘biologist’

    3

    Job = ‘biologist’

    3

    1

    Job = ‘biologist’

    special case

    Id = ‘Alice’

    Job = ‘doctors’

    Job = ‘doctors’

    1

    Job = ‘CTO’

    P

    G

    Id = ‘Alice’

    Job = ‘doctors’

    Enriched model for capturing meaningful matches


    Basic results for the bounded simulation

    Basic results for the bounded simulation

    • For any graph G and pattern P, if G matches P, then there is a unique maximum match in G for P.

    • The graph pattern matching problem via bounded simulation can be solved in cubic time.

    • The incremental bounded simulation problem

    extension for multiple edge colors?

    Efficient approaches for graph pattern matching


    Considering edge types

    Considering edge types…

    strangers-nemeses

    strangers-allies

    friends-allies

    friends-nemeses

    Essembly Network

    Real life graphs have multiple edge types


    Querying essembly network an example

    Querying Essembly network: an example

    sn

    fa+

    sa

    fa<=2 sa<=2

    Biologists supporting Cloning

    fa

    fn

    fa<=2 sn

    fn

    Alice

    Doctors

    Against cloning

    fn

    P

    Essembly Network

    Pattern queries with multiple edge types


    Graph reachability and pattern queries

    Graph reachability and pattern queries

    • Real life graphs usually bear different edge types…

      • data graph G = (V, E, fA, , fC)

    • Reachability query (RQ) : (u1, u2, fu1, fu2, fe) where fe is a subclass of regular expression of:

      • F ::= c | c≤k | c+ | FF

  • Qr(G): set of node pairs (v1, v2) that there is a nonempty path from v1 to v2 , and the edge colors on the path match the pattern specified by fe.

  • Job=‘biologist’, sp=‘cloning’

    fa<=2 fn

    Job=‘doctors’


    Graph pattern queries

    Graph pattern queries

    • graph pattern queries PQ Qp =(Vp, Ep, fv, fe) where for each edge e=(u,u’), Qe=(u1, u2, fv(u), fv(u’), fe(e)) is an RQ.

    • Qp(G) is the maximum set (e, Se)

      • for any e1(u1,u2) and e2(u2 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v2,v3) is in Se2 .

      • for any two edges e1(u1,u2) and e2(u1 ,u3), if (v1,v2) is in Se1, then there is a v3 that (v1,v3) is in Se2

    • PQ vs. simulation and bounded simulation

      • search condition on query nodes

      • mapping edges to paths

      • constrain the edges on the path with a regular expression

    RQ and bounded simulation are special cases of PQ


    Reachability and graph pattern query examples

    Reachability and graph pattern query: examples

    sn

    sa

    fa

    fn

    Job=‘biologist’, sp=‘cloning’

    fa+

    fa<=2 sa<=2

    Job=‘biologist’, sp=‘cloning’

    fa<=2 sn

    fa<=2 fn

    fn

    Id=‘Alice’

    Job=‘doctors’

    dsp=‘cloning’

    Job=‘doctors’

    fn


    Fundamental problems query containment

    Fundamental problems: query containment

    • PQ Q1 (V1, E1, fv1, fe1) is contained in Q2 (V2, E2, fv2, fe2) if there exists a mapping λ from E1 to E2 s.t for any data graph G and e in E1, Se is a subset of Sλ(e) , i.e., λ is a renaming function that Q1(G) is mapped to Q2(G).

    • Query containment and equivalence problems can all be determined in cubic time

      • Query similarity based on a revision of graph simulation

      • Determine the query similarity in cubic time

    Query containment and equivalence for PQs can be solved efficiently


    Query containment example

    query containment: example

    h<=3

    h<=3

    h<=1

    h<=1

    h<=1

    h<=2

    C2

    C3

    C4

    C6

    B1

    B2

    B3

    C5

    C1

    Q1

    Q3

    Q2


    Fundamental problems query minimization

    Fundamental problems: query minimization

    • Query minimization problem

      • input: a PQ Qp

      • output: a minimized PQ Qm equivalent to Qp

    • Query minimization problem can be solved in cubic time.

      • compute the maximum node equivalent classes based on a revision of graph simulation;

      • determine the number of redundant nodes and edges based on the equivalent classes;

      • Removed redundant and isolated nodes and edges

    Query minimization for PQs can be solved efficiently


    Query minimization example

    query minimization: example

    g

    g

    g

    f

    f

    f

    R

    R

    R

    B

    B

    B

    g<=3

    h<=2

    g<=3

    g<=3

    g<=3

    B

    B

    B

    g<=3

    h<=2

    g<=3

    h<=2

    h<=2

    h<=2

    h<=2

    C

    C

    C

    C

    C

    C

    C

    C

    Q1

    Q2

    Q3


    Evaluating graph pattern queries

    Evaluating graph pattern queries

    • PQ can be answered in cubic time.

      • Join-based Algorithm JoinMatch

        • Matrix index vs distance cache

        • join operation for each edge in PQ until a fixpoint is reached (wrt. a reversed topological order)

      • Split-based Algorithm SplitMatch

        • blocks: treating pattern node and data node uniformly

        • partition-relation pair

    Graph pattern matching can be solved in polynomial time


    Example of joinmatch

    Example of JoinMatch

    sn

    sa

    fa

    fn

    fa+

    fa<=2 sa<=2

    Job=‘biologist’, sp=‘cloning’

    fa<=2 sn

    fn

    Id=‘Alice’

    Job=‘doctors’

    dsp=‘cloning’

    fn


    Example of joinmatch1

    Example of JoinMatch

    sn

    sa

    fa

    fn

    fa+

    fa<=2 sa<=2

    Job=‘biologist’, sp=‘cloning’

    fa<=2 sn

    fn

    Id=‘Alice’

    Job=‘doctors’

    dsp=‘cloning’

    fn


    Example of joinmatch2

    Example of JoinMatch

    sn

    sa

    fa

    fn

    fa+

    fa<=2 sa<=2

    Job=‘biologist’, sp=‘cloning’

    fa<=2 sn

    fn

    Id=‘Alice’

    Job=‘doctors’

    dsp=‘cloning’

    fn


    Example of joinmatch3

    Example of JoinMatch

    sn

    sa

    fa

    fn

    fa+

    fa<=2 sa<=2

    Job=‘biologist’, sp=‘cloning’

    fa<=2 sn

    fn

    Id=‘Alice’

    Job=‘doctors’

    dsp=‘cloning’

    fn


    Experimental results effectiveness of pqs

    Experimental results – effectiveness of PQs

    Effectiveness of PQs: edge to path relations


    Experimental results querying real life graphs

    Experimental results – querying real life graphs

    Varying |Vp|

    Varying |Ep|

    Evaluation algorithms are sensitive to pattern edges


    Experimental results querying real life graphs1

    Experimental results – querying real life graphs

    Varying |pred|

    Varying b

    The algorithms are sensitive to the number of predicates


    Experimental results querying synthetic graphs

    Experimental results – querying synthetic graphs

    Varying b

    Varying |V| (x105)

    The algorithms scale well over large synthetic graphs


    Experimental results querying synthetic graphs1

    Experimental results – querying synthetic graphs

    Varying α

    Varying cr

    The algorithms scale well over large synthetic graphs


    Conclusion

    Conclusion

    • Simulation revised for graph pattern matching

      • Bounded Simulation

        • node predicates, edge bound, edge-to-path matching relation

      • Reachability Queries and Graph Pattern Queries

        • query containment and minimization – cubic time

        • query evaluation – cubic time

    • Future work

      • extending RQs and PQs by supporting general regular expressions

      • incremental evaluation of RQs and PQs

    Simulation revised for graph pattern matching


    Simulation revised for graph pattern matching

    Thank you!

    Terrorist Collaboration Network (1970 - 2010)

    “Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden)


  • Login