I98
Uploaded by
52 SLIDES
0 VIEWS
0LIKES

Session 09

DESCRIPTION

Graph Database

1 / 52

Download Presentation

Session 09

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Course : COMP8025 – Big Data Analytics Introduction to Graphs Session 09 Dr. Sani M. Isa This presentation adopted from Ilkay Altintas and Amarnath Gupta, UC San Diego

  2. In this module you will learn a number of basic graph analytic techniques. After this module you will be able to identify the right class of techniques to apply for a graph analytic problem

  3. Module Plan Focus on Analytic Techniques • Big Data computing in Modules 3 and 4 Basic definitions Analytics • Path Analytics • Connectivity Analytics • Community Analytics • Centrality Analytics • • •

  4. Our First Definition of Graphs • • V: a set of vertices E: a set of edges

  5. Graph of a Tweet node edge A Real Graph Has More Information Content

  6. Media node Tweet node URL node User node Node Types (aka Labels)

  7. Graphs with Node Types v6 v1 v2 v5 v3 v4 • • • • V: a set of vertices E: a set of edges TN: a set of node types f (TN? ?V): type assignment to nodes

  8. Media node Tweet node URL node Attribute User node text:We’ve just posted a sneak preview of … Value • Node Schema = Properties (Attributes) with Values

  9. interactionType: Biochemical Activity Edge Label Attribute Value Edge Schema interactionType: physical, genetic, biochemical, … Domain of interactionType

  10. Extended Graph Model v6 v1 v2 v5 • • • • • • • • • • V: set of vertices E: set of edges TN: set of node types f (TN? ?V): type assignment to nodes TE: a set of edge types g (TE? ?E): type assignment to edges AN: set of node attributes AE: set of edge attributes dom(AN[?]): domain of the ?-th node attribute dom(AE[?]): domain of the ?-th edge attribute v3 v4

  11. “Weight” – an Edge Property No weights • Why weights? • “Distance” in a road network • “Strength of Connection” in a personal network • “Likelihood of interaction” in a biological network • “Certainty of information” in a knowledge network With weights

  12. Loop: the head and tail of an edge are on the same node Structural Property of a Graph

  13. Multi- graph 1 A Graph with more than one edge between two nodes 2 3 4 5 Why multiple edges? Each edge has a different information content

  14. Path Analytics

  15. A Walk in the Graph D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 Edge weight 15 5 5 15 E I 20 10 H

  16. A Walk in the Graph • Walk • an alternating sequence of vertices and edges over a graph D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E I 20 10 H

  17. A Walk in the Graph • Constraining a Walk • Path • A walk with no repeating node except possibly for the first and last D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E I 20 10 H

  18. A Walk in the Graph • Constraining a Walk • Cycle • A path of length n ≥ 3 whose start and end vertices are the same • Acyclic • Graph with no cycles D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E I 20 10 H

  19. A Walk in the Graph • Constraining a Walk • Trail • A walk with no repeating edge – H?F?G?C?F?E?C?F?J not a trail D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E I 20 10 H

  20. A Walk in the Graph • Reachability • node ? is reachable from node ? if there is a path from ? to ? • A is reachable from I • I is not reachable from A D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E I 20 10 H

  21. Diameter of a Graph Maximum pairwise distance between nodes To A B C D E F G H I J A 0 ∞∞∞∞∞∞∞∞∞ B 1 0 ∞∞∞∞∞∞∞∞ C 1 1 0 1 2 1 2 ∞∞ 2 D 1 ∞∞ 0 ∞∞∞∞∞∞ E 2 1 1 2 0 2 3 ∞∞ 3 F 3 3 2 2 1 0 1 ∞∞ 1 G 2 2 1 1 3 2 0 ∞∞ 3 D 5 10 G A 20 5 20 5 From 10 C 10 5 10 15 J F 15 B 5 20 15 H 3 2 2 3 1 1 2 0 1 2 I 4 3 3 3 2 1 2 ∞ 0 1 J 3 3 2 2 2 1 1 ∞∞ 0 5 5 15 E I 20 10 H (Shortest-hop) Distance Matrix

  22. Diameter of a Graph Maximum pairwise distance between nodes To A B C D E F G H I J A 0 ∞∞∞∞∞∞∞∞∞ B 1 0 ∞∞∞∞∞∞∞∞ C 1 1 0 1 2 1 2 ∞∞ 2 D 1 ∞∞ 0 ∞∞∞∞∞∞ E 2 1 1 2 0 2 3 ∞∞ 3 F 3 3 2 2 1 0 1 ∞∞ 1 G 2 2 1 1 3 2 0 ∞∞ 3 D 5 10 G A 20 5 20 5 From 10 C 10 5 10 15 J F 15 B 5 20 15 H 3 2 2 3 1 1 2 0 1 2 I 4 3 3 3 2 1 2 ∞ 0 1 J 3 3 2 2 2 1 1 ∞∞ 0 5 5 15 E I 20 10 H (Shortest-hop) Distance Matrix

  23. The Basic Path Analytics Question • What is the “best” path to go from node 1 to node 2? • Specification of “best” may include • Function to optimize • Nodes/edges to traverse • Nodes/edges to avoid • Preferences to satisfy

  24. Find least weight path from I to B • Avoid paths through E • Must go through J Optimization function D 5 10 G Constraints A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  25. Find least weight path from I to B Simpler Problem Optimization function D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  26. Routes on Google Maps • “Shortest” route would change based on weather, traffic, road condition…

  27. Find least weight path from I to B A Standard Method: Dijkstra’s algorithm Many good online tutorials Simpler Problem Optimization function D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  28. Dijkstra’s Algorithm ? I A B C D E 0 ?????? F G ? H J ?? ?[?] ????[?] ????[?] N N N N N N N D 5 10 G A 20 5 20 N N N 5 10 C ? 10 I A B C D E 0 ?????? F G ? H J ?? 5 10 15 J ?[?] F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  29. Step 1 ? I A B C D E 0 ????? F G ? H J ? 15 ?[?] ????[?] ????[?] Y N N N N N N D 5 5 10 G I I A 20 5 20 N N N 5 10 C ? 10 I A B C D E 0 ????? F G ? H J ? 15 5 10 15 J ?[?] 5 F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  30. Step 2 5+5 ? I A B C D E 0 ???? 25 5 F G 15 ? 10 H J ?[?] ????[?] ????[?] Y N N N N N D 5 10 G F I F F A 20 5 20 Y N N N 5 10 C ? 10 A B C D E ???? 25 5 F G 15 ? 10 H J 5 10 15 J ?[?] F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  31. Step 3 ? I A B C D E 0 ???? 25 5 F G 15 ? 10 H J ?[?] ????[?] ????[?] Y N N N N N D 5 10 G F I F F A 20 5 20 Y N N Y 5 10 C ? 10 A B C D E ???? 25 15 ? 10 G H J 5 10 15 J ?[?] F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  32. Step 4 ? I A B C D E 0 ?? 35 25 25 5 F G 15 ? 10 H J ?[?] ????[?] ????[?] Y N N N N N D 5 10 G G F I F F G A Y Y N Y 20 5 20 5 ? A B C D E ?? 35 25 25 15 ? ?D or F? ?E? G H 10 C 10 5 10 ?[?] G? 15 J F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  33. Step 5 (with E) I A B C D E 0 ? 30 30 25 25 5 E E G F ????[?] Y N N Y N Y ? F G 15 ? 10 F Y N Y H J ?[?] ????[?] D 5 10 I Y F G A 20 5 20 5 10 ? C A B C D E H ? 30 25 25 25 ? Why is H still ? ?? H is a root and unreachable 10 Reached B! 5 10 15 J ?[?] F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  34. Construct Path ? I A B C D E F G H J D 5 ????[?] ????[?] Y N Y 10 E E G F I F F G Y N Y Y Y N Y A 20 5 20 5 Follow predecessors from target! 10 C 10 5 10 15 J F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  35. So how well does the algorithm work for big graphs?

  36. Dijkstra and Big Graphs The worst-case complexity of Dijkstra is proportional to the number of edges times log(number of nodes) • For1 Million nodes and 10 Million edges, the worst case complexity is proportional to ~14 Million!! • That’s really high!!

  37. We have two optional videos that you can go through. These videos present two modifications to improve Dijkstra’s algorithm practically

  38. Find least weight path from I to B • Avoid paths through E • Must go through J Optimization function D 5 10 G Constraints A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  39. D 5 10 5 G J Target1 A F 20 5 15 20 10 C 15 10 5 10 15 Source2 J F 15 15 Operating over a relevant subgraph B Source1 I H Target2 Splitting the task

  40. Optional 1

  41. Find least weight path from I to A Can we do better? Optimization function D 5 10 Bi-directional Dijkstra algorithm G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E Source I 20 10 H

  42. Step 1 Start with the Source Node like before Reach the first frontier Length(I?F) = 5 D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E Source I 20 10 H

  43. Step 2 Start from Target Node Pretend all edges are reversed Reach the first reverse frontier Length(A?D) = 5 D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E Source I 20 10 H

  44. Step 3 Alternate between forward frontier and reverse frontier Length(I?F?G) = 15 (ignored J) Length(A?D) = 5 D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E Source I 20 10 H

  45. Step 4 Alternate between forward frontier and reverse frontier Length(I?F?G) = 15 Length(A?D?G) = 15 D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E Source I 20 10 H

  46. Step 5 Until a front touches a node of the other front Length(I?F?G) = 15 Length(A?D?G) = 15 Ensure that the sum of the weighted length is minimum D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 5 15 E Source I 20 10 H

  47. Least Weight, not Least Hop The 2-step path F?C?A Has more weight than the 3-step paths F?G?D?A Count the sum of the shortest weighted path from both frontiers D 5 5 G A 10 5 C 10 Source F

  48. Optional 2

  49. Unresolved Issue Path taken by algorithm: F?G IF we could take F?E instead, it would terminate sooner • How can we change weights? D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

  50. Goal-Directed Dijkstra • The target is B • Can we make a node closer to B less costly to get to? • Use potential function ? • Modified weight • ?’ ?,? = ? ?,? − ? ? + ? ? • Should not be negative D 5 10 G A 20 5 20 5 10 C 10 5 10 15 J F 15 B 5 20 15 5 Target 5 15 E Source I 20 10 H

More Related