1 / 19

Triangle Finding: How Graph Theory can Help the Semantic Web

Triangle Finding: How Graph Theory can Help the Semantic Web . Edward Jimenez, Eric Goodman. The Semantic Web as a Graph. The Semantic Web as a Graph. Optimizing Queries with Graph Theory. Query9 SELECT ?X, ?Y, ?Z WHERE { ? X rdf:type ub:Student . ? Y rdf:type ub:Faculty .

devi
Download Presentation

Triangle Finding: How Graph Theory can Help the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Triangle Finding: How Graph Theory can Help the Semantic Web Edward Jimenez, Eric Goodman

  2. The Semantic Web as a Graph

  3. The Semantic Web as a Graph

  4. Optimizing Queries with Graph Theory Query9 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:typeub:Student . ?Y rdf:typeub:Faculty . ?Z rdf:typeub:Course . ?X ub:advisor ?Y . ?Y ub:teacherOf ?Z . ?X ub:takesCourse ?Z} Query2 SELECT ?X, ?Y, ?Z WHERE { ?X rdf:typeub:GraduateStudent . ?Y rdf:typeub:University . ?Z rdf:typeub:Department . ?X ub:memberOf ?Z . ?Z ub:subOrganizationOf ?Y . ?X ub:undergraduateDegreeFrom ?Y} • Graph theory has a lot to offer the semantic web • One example: triangle finding • O(|E|1.5) • Much more efficient than what a typical database would do.

  5. Experiment • Compare these three approaches, finding all triangles in a graph • Sesame • Jena • MultiThreaded Graph Library (MTGL) • MTGL • Open source library of graph algorithms, targeted towards shared memory supercomputers • Used MTGL’s implementation of J. Cohen’s triangle finding algorithm • Had to modify slightly to allow for multiple edges between vertices.

  6. Data a b a b c d c d • Data: An Recursive Matrix (R-MAT) graph • Specify • |V| • edge factor (average number of edges per vertex) • Probabilities a, b, c, d, wherea+b+c+d=1. • Has properties similar to real-world graphs such as short diameters and small-world properties. • Used as basis of Graph500 benchmark. • Nodes are given a unique IRI and edges are given a random value. • |V| = {25-219} • Edge factor: {16, 32, 64}

  7. Possible Triangles

  8. Trying to Find Triangles via SPARQL UNION {?X ?a ?Y . ?Z ?b ?Y . ?Z ?c ?X } UNION {?Y ?a ?X ?Y ?b ?Z ?Z ?c ?X}} UNION {?X ?a ?Y . ?Z ?b ?Y . ?X ?c ?Z } UNION {?Y ?a ?X ?Y ?b ?Z ?X ?c ?Z} UNION {?Y ?a ?X ?Z ?b ?Y ?Z ?c ?X} Redundant Solutions SELECT ?X ?Y ?Z WHERE { {?X ?a ?Y . ?Y ?b ?Z . ?Z ?c ?X } UNION {?Y ?a ?X ?Z ?b ?Y ?X ?c ?Z} UNION {?X ?a ?Y ?Y ?b ?Z ?X ?c ?Z}

  9. The Problem: Graph Isomorphism ?X iii ?Z ?Y ?X iv Alice Alice ?Z ?Y Bob Charlie Charlie Bob ?X = Alice ?Y = Charlie ?Z = Bob ?X = Alice ?Y = Bob ?Z = Charlie

  10. The Other Problem: Automorphism ?X i ?Z ?Y Alice ?X = Alice ?Y = Bob ?Z = Charlie Charlie Bob Charlie ?X = Charlie ?Y = Alice ?Z = Bob Bob Alice

  11. Possible Triangles

  12. The SPARQL Query SELECT ?X ?Y ?Z WHERE {{ ?X ?a ?Y . ?Y ?b ?Z . ?Z ?c ?X FILTER (STR(?X) < STR(?Y)) FILTER (STR(?Y) < STR(?Z)) } UNION { ?X ?a ?Y . ?Y ?b ?Z . ?Z ?c ?X FILTER (STR(?Y) > STR(?Z)) FILTER (STR(?Z) > STR(?X)) } UNION { ?X ?a ?Y . ?Y ?b ?Z . ?X ?c ?Z }}

  13. Cohen’s Triangle Algorithm • Assumptions • Simplified graph • Completely connected • Map 1: O(m) • Use v1< v2< ··· < vnfortie-breaking

  14. Cohen’s Triangle Algorithm <v1,v2>, <v1,v3> <v1,v2>, <v1,v4> … … <v1,v2>, <v1,vn> Reduce: O(m3/2)

  15. Cohen’s Triangle Algorithm v8 v8 v8 v20 <v8, v20> bin v20 v20 … v1 v3 v2 • Reduce 2: O(m3/2) • Emit triangles for the contents of each <vi, vj> bin when the edge exists between vi and vj. v8 v20 • Map 2: O(m3/2) • Identity mapping of previous reduce step. • Map edges

  16. Results: Growth of Triangles

  17. Results

  18. Comparison at Larger Scales • With 1 billion edges, assuming the same constant • An O(x1.39) implementation versus an O(x1.58) is 50x faster • An O(x1.39) implementation versus an O(x1.83) is 9000x faster

  19. Conclusions The Semantic Web is a graph Graph theory can add a lot in terms of speeding up queries It also has other approaches for analyzing the data SPARQL has unexpected issues when graph isomorphism or automorphisms arise.

More Related