1 / 31

G raph X -Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs

L L N L. G raph X -Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs. Hanghang Tong , Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad. Input. Output. Query Graph. Matching Subgraph. Attributed Data Graph. Terminology: `` Conform ’’. Matching Subgraph conforms.

desiree
Download Presentation

G raph X -Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. L L N L Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad KDD 2007, San Jose

  2. Input Output Query Graph Matching Subgraph Attributed Data Graph

  3. Terminology: ``Conform’’ Matching Subgraph conforms Query Graph

  4. Terminology: ``Interception’’ Intermediate node matching node matching node matching node matching node Matching Subgraph Query Graph Path 12-13-4 is an Interception

  5. Terminology: ``Instantiate’’ Matching Subgraph Ht Query Graph Hq Node 11 instantiates SEC node Htinstantiates Hq

  6. Roadmap • Introduction • Problem Definition • Motivations • How to: Graph X-Ray • Experimental Results • Conclusion

  7. Motivation: Why Not SQL? • Case 1: Exact match does not exist • Q: How to find approximate answer? • Case 2: Too many exact matches • Q: How to rank them?

  8. Motivation: Why Not SQL? (Cont.) • Case 3: Exact match might be not the best answer • ``Find CEO who has heavy contact with Accountant’’ • Q: how to find right? Exact match 1 direct connection Inexact match Many indirect connections

  9. Motivation: Efficiency • Why Not Subgraph Isomorphism? • Polynomial for fixed # of pattern query • Q1: How to scale up linearly? • Q2: … and with a small slope?

  10. Wish List • Effectiveness • Both exact match & inexact Match • Ranking among multiple results • ``Best’’ answer (proximity-based) • Efficiency • Scale linearly • Scale with small scope G-Ray meets all!

  11. Roadmap • Introduction • Problem Definition • Motivations • How to: Graph X-Ray • Experimental Results • Conclusion

  12. Preliminary: Center-Piece Subgraph [Tong+] Q Original Graph Black: query nodes CePS is meta opt. in G-Ray!

  13. Preliminary: Augmented Graph • Data nodes • 1,…13 • Attribute nodes • a Footnote Aug. Graph is crucial for computation!

  14. Step 1: SF Step 3: BR Step 2: NE Step 6: NE Step 4: NE Step 5: BR Step 7: BR Step 8: BR G-Ray: quick overview (for loop ) SF: Seed-Finder NE: Neighborhood -Expander BR: Bridge

  15. Seed-Finder ( ) • Q: How to instantiate SEC node? • A: Footnote `11’ is close to some un-known data nodes for `CEO’`Account.’ and `Manager’

  16. Neighborhood-Expander ( ) • Q: How to instantiate CEO node? • Step 1  Step 2? • A: • Footnote: • Step 3  Step 4? • Step 5  Step 6?

  17. Step 6: NE Step 7: BR Bridge ( ) • Q: • A: Prim-like Alg. • To maximize • Should block node 11 and 7 • Footnote • Connection subgraph, or one single path? ?

  18. Roadmap • Introduction • Problem Definition • Motivation • How to: Graph X-Ray • Experimental Results • Conclusion

  19. Experimental Results • Datasets • DBLP • Node: author (315k) • Edge: co-authorship (1,800k) • Attribute: conference & year (13k) • KDD-2001, SIGMOD…

  20. Effectiveness: star-query Query Result

  21. Effectiveness: line-query Query Result

  22. Effectiveness: loop-query Query Result

  23. Efficiency Response Time • Scale linearly • Small slope • 3-5 Seconds # of Edges ~2 M edges

  24. Roadmap • Introduction • Problem Definition • Motivation • How to: Graph X-Ray • Experimental Results • Conclusion

  25. Conclusion • Graph X-Ray (G-Ray) • Best effort pattern match • in large attributed graphs • Scale linearly • with small slope • More details in Poster Session • Monday (tonight) • board number 8

  26. Thank you! www.cs.cmu.edu/~htong G-Ray X-Ray

  27. Backup-slides

  28. 10 9 12 2 8 1 11 3 4 6 5 7 Proximity on Graph a.k.a relevance, closeness • Multi-faceted • Punish long path • Edge weight How to: ---- random walk with restart

  29. 0.03 0.04 10 9 0.10 12 2 0.08 0.02 0.13 8 1 0.13 11 3 0.04 4 0.05 6 5 0.13 7 0.05 Random walk with restart Nearby nodes, higher scores Ranking vector More red, more relevant

  30. How to rank the results • Our goodness function • Measure the proximity between any two matching nodes if they are required to be connected. (two-way) • Multiply them together • In G-Ray, we approximately optimize this goodness functions • If we have multiple matching subgraphs, we can rank them according to this goodness functions

  31. How to rank the results matching node matching node matching node matching node Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)

More Related