Chapter 10 Link Analysis - PowerPoint PPT Presentation

chapter 10 link analysis n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Chapter 10 Link Analysis PowerPoint Presentation
Download Presentation
Chapter 10 Link Analysis

play fullscreen
1 / 12
Chapter 10 Link Analysis
108 Views
Download Presentation
jael
Download Presentation

Chapter 10 Link Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Chapter 10Link Analysis

  2. Data Mining Techniques So Far… • Chapter 5 – Statistics • Chapter 6 – Decision Trees • Chapter 7 – Neural Networks • Chapter 8 – Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering • Chapter 9 – Market Basket Analysis and Association Rules

  3. Introduction • Airline Route Maps are useful • Hyperlinks were revolutionary • Apple’s HyperCard (Bill Atkinson) • Claim that there are no more than 6 degrees of separation between any two people on the planet • Link Analysis is the data mining technique that addresses relationships and connections • Link Analysis is based on Graph Theory

  4. Introduction • As you would expect, Link Analysis has its limitations as a DM technique also • However, quite effective in these and similar situations • Identifying authoritative sources of information on the WWW by analyzing page links • Understanding physician referral patterns • Analyzing telephone call patterns

  5. Basic Graph Theory • Graphs are an abstraction used to represent relationships • Graphs consist of • Nodes (vertices) which are the things in the graph that have relationships • Edges are pairs of nodes connected by a relationship • Visualization is a key characteristic of a graph

  6. Basic Graph Theory • A path is an ordered sequence of nodes connected by edges • Flight Segments (legs) such as LA – Denver – Boston • A weighted graph is one in which the edges have weights associated with them • Example: Weights support the association between two products being purchased together

  7. Graph Theory Classic Problems • Finding a path in the graph that visits every edge exactly one time (Seven Bridges – edges are bridges and nodes are land) • Finding the shortest path that visits the nodes in the graph exactly one time (Traveling Salesman) • Completely connected graph with n nodes has n! (n factorial) unique paths that contain all nodes (5! = 5 * 4 * 3 * 2 * 1 = 120)

  8. Directed vs Undirected Graphs • Undirected graphs – edges between nodes go in both directions (A to B; B to A) • Directed graphs – edges between nodes only go in one direction (A to B is different than B to A) • Ex: WWW

  9. Web pages = nodes Hyperlinks = edges Spiders & Web crawlers updating Kleinberg’s Algorithm Hub – a page that links to many authorities Authority – a page that is linked to by many hubs Google – Directed Graph Example

  10. Google – example continued • Authority versus mere popularity • Rank by number of unrelated sites linking to a site yields popularity • Rank by number of subject-related hubs that point to them yields authority • Helps to overcome the situation that often arises in popularity where the real authority (eg Home Page) is ranked lower because of lack of popularity of links to it

  11. Examples of Link Analysis • Recent Int’l Data Mining Conference • http://www.siam.org/meetings/sdm04/ • Chapter10-Example1.pdf • Chapter10-Example2.pdf • Chapter10-Example3.pdf • Megaputer (PolyAnalyst vendor) page: • http://www.megaputer.com/products/pa/algorithms/la.php3

  12. End of Chapter 10