Download
enhancing cluster labeling using wikipedia n.
Skip this Video
Loading SlideShow in 5 Seconds..
Enhancing Cluster Labeling Using Wikipedia PowerPoint Presentation
Download Presentation
Enhancing Cluster Labeling Using Wikipedia

Enhancing Cluster Labeling Using Wikipedia

133 Views Download Presentation
Download Presentation

Enhancing Cluster Labeling Using Wikipedia

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Enhancing Cluster Labeling Using Wikipedia David Carmel, HaggaiRoitman, NaamaZwerdlingIBM ResearchLab {carmel,haggai,naamaz}@il.ibm.com Presentby Miguel Panuera mpanuera@gmail.com • School of Computer Science San Pablo CatholicUniversity AREQUIPA – PERU 2010

  2. CONTENT • ClusterLabeling • WhyWikipedia • Terms extracted: JSD vs Wikipedia • General Framework forclusterlabeling • Experiments • Summary

  3. ClusterLabeling • This process tries to select descriptive labels for the clusters

  4. WhyWikipedia • One of the major knowledge resource for manyinformationretrievaltasks. • Textcategorizationand clustering. • Computing semanticrelatednessbetweenconcepts. • Predictingdocumenttopics.

  5. Terms extracted: JSD vs Wikipedia While the list of important terms fairly represents the content of the categories, these terms can serve as appropriate labels for only a few categories. On the other hand, Wikipedia labels agree with human annotated labels much more.

  6. GENERAL FRAMEWORK FOR CLUSTER LABELING

  7. GENERAL FRAMEWORK FOR CLUSTER LABELING Documents are first parsed and tokenized

  8. GENERAL FRAMEWORK FOR CLUSTER LABELING The clustering algorithms goal is to create coherent clusters for which documents within a cluster share the same topics

  9. GENERAL FRAMEWORK FOR CLUSTER LABELING We now wish to find a list of terms ordered by their estimated importance, to represent the content of the cluster’s documents. Such terms consist of single keywords

  10. GENERAL FRAMEWORK FOR CLUSTER LABELING Wenowwishtoextract candidate labels for cluster C

  11. GENERAL FRAMEWORK FOR CLUSTER LABELING Candidate labels are evaluated by several judges. Theneachjudge evaluates the candidates according to its evaluation policy.

  12. Experiments K: indicates the number of required cluster labels Match@K: The relative number of clusters for which at least one of the top-k labels is correct.

  13. Summary • Wedescribed a general framework for cluster labeling that extracts candidate labels from the text and from Wikipedia • Cluster labeling with Wikipedia is extremely successful, as shown by our results.

  14. THANKS

  15. Enhancing Cluster Labeling Using Wikipedia David Carmel, HaggaiRoitman, NaamaZwerdlingIBM ResearchLab {carmel,haggai,naamaz}@il.ibm.com Presentby Miguel Panuera mpanuera@gmail.com San Pablo CatholicUniversity • School of Computer Science AREQUIPA – PERU 2010