Download Presentation
## Chapter 10 Link Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Data Mining Techniques So Far…**• Chapter 5 – Statistics • Chapter 6 – Decision Trees • Chapter 7 – Neural Networks • Chapter 8 – Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering • Chapter 9 – Market Basket Analysis and Association Rules**Introduction**• Airline Route Maps are useful • Hyperlinks were revolutionary • Apple’s HyperCard (Bill Atkinson) • Claim that there are no more than 6 degrees of separation between any two people on the planet • Link Analysis is the data mining technique that addresses relationships and connections • Link Analysis is based on Graph Theory**Introduction**• As you would expect, Link Analysis has its limitations as a DM technique also • However, quite effective in these and similar situations • Identifying authoritative sources of information on the WWW by analyzing page links • Understanding physician referral patterns • Analyzing telephone call patterns**Basic Graph Theory**• Graphs are an abstraction used to represent relationships • Graphs consist of • Nodes (vertices) which are the things in the graph that have relationships • Edges are pairs of nodes connected by a relationship • Visualization is a key characteristic of a graph**Basic Graph Theory**• A path is an ordered sequence of nodes connected by edges • Flight Segments (legs) such as LA – Denver – Boston • A weighted graph is one in which the edges have weights associated with them • Example: Weights support the association between two products being purchased together**Graph Theory Classic Problems**• Finding a path in the graph that visits every edge exactly one time (Seven Bridges – edges are bridges and nodes are land) • Finding the shortest path that visits the nodes in the graph exactly one time (Traveling Salesman) • Completely connected graph with n nodes has n! (n factorial) unique paths that contain all nodes (5! = 5 * 4 * 3 * 2 * 1 = 120)**Directed vs Undirected Graphs**• Undirected graphs – edges between nodes go in both directions (A to B; B to A) • Directed graphs – edges between nodes only go in one direction (A to B is different than B to A) • Ex: WWW**Web pages = nodes**Hyperlinks = edges Spiders & Web crawlers updating Kleinberg’s Algorithm Hub – a page that links to many authorities Authority – a page that is linked to by many hubs Google – Directed Graph Example**Google – example continued**• Authority versus mere popularity • Rank by number of unrelated sites linking to a site yields popularity • Rank by number of subject-related hubs that point to them yields authority • Helps to overcome the situation that often arises in popularity where the real authority (eg Home Page) is ranked lower because of lack of popularity of links to it**Examples of Link Analysis**• Recent Int’l Data Mining Conference • http://www.siam.org/meetings/sdm04/ • Chapter10-Example1.pdf • Chapter10-Example2.pdf • Chapter10-Example3.pdf • Megaputer (PolyAnalyst vendor) page: • http://www.megaputer.com/products/pa/algorithms/la.php3