information retrieval n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Information Retrieval PowerPoint Presentation
Download Presentation
Information Retrieval

Loading in 2 Seconds...

play fullscreen
1 / 13
wilbur

Information Retrieval - PowerPoint PPT Presentation

63 Views
Download Presentation
Information Retrieval
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Information Retrieval For the MSc Computer Science Programme Lecture 7 Introduction to Information Retrieval (Manning et al. 2007) Chapter 17 Dell Zhang Birkbeck, University of London

  2. … (30) agriculture biology physics CS space ... ... ... ... ... dairy botany cell AI courses crops craft magnetism HCI missions agronomy evolution forestry relativity Yahoo! Hierarchy http://dir.yahoo.com/science

  3. Hierarchical Clustering • Builda tree-like hierarchical taxonomy (dendrogram) from a set of unlabeled documents. • Divisive (top-down) • Start with all documents belong to the same cluster. Eventually each node forms a cluster on its own. • Recursive application of a (flat) partitional clustering algorithm, e.g., kMeans (k=2)  Bi-secting kMeans. • Agglomerative (bottom-up) • Start with each document being a single cluster. Eventually all documents belong to the same cluster.

  4. Dendrogram Clustering is obtained by cutting the dendrogram at a desired level: each connected component forms a cluster. The number of clusters k is not required in advance.

  5. Dendrogram – Example Clusters of News Stories: Reuters RCV1

  6. Dendrogram – Example Clusters of Things that People Want: ZEBO

  7. HAC • Hierarchical Agglomerative Clustering • Starts with each doc in a separate cluster. • Repeat until there is only one cluster: • Among the current clusters, determine the pair of clusters, ci and cj, that are most similar. • (Single-Link, Complete-Link, etc.) • Then mergesci and cj to a single cluster. • The history of merging forms a binary tree or hierarchy.

  8. Single-Link • The similarity between a pair of clusters is defined by the single strongest link (i.e., maximum cosine-similarity) between their members: • After merging ci and cj, the similarity of the resulting cluster to another cluster, ck, is:

  9. HAC – Example

  10. HAC – Example

  11. d3,d4,d5 d4,d5 d3 HAC – Example • As clusters agglomerate, docs are likely to fall into a dendrogram. d3 d5 d1 d4 d2 d1,d2

  12. HAC – Example Single-Link

  13. Take Home Message • Single-Link HAC • Dendrogram