1 / 15

Enhancing Cluster Labeling Using Wikipedia

Enhancing Cluster Labeling Using Wikipedia. David Carmel, Haggai Roitman , Naama Zwerdling IBM Research Lab { carmel,haggai,naamaz }@ il.ibm.com Present b y Miguel Panuera mpanuera@gmail.com. School of Computer Science San Pablo Catholic University AREQUIPA – PERU 2010. CONTENT.

allayna
Download Presentation

Enhancing Cluster Labeling Using Wikipedia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancing Cluster Labeling Using Wikipedia David Carmel, HaggaiRoitman, NaamaZwerdlingIBM ResearchLab {carmel,haggai,naamaz}@il.ibm.com Presentby Miguel Panuera mpanuera@gmail.com • School of Computer Science San Pablo CatholicUniversity AREQUIPA – PERU 2010

  2. CONTENT • ClusterLabeling • WhyWikipedia • Terms extracted: JSD vs Wikipedia • General Framework forclusterlabeling • Experiments • Summary

  3. ClusterLabeling • This process tries to select descriptive labels for the clusters

  4. WhyWikipedia • One of the major knowledge resource for manyinformationretrievaltasks. • Textcategorizationand clustering. • Computing semanticrelatednessbetweenconcepts. • Predictingdocumenttopics.

  5. Terms extracted: JSD vs Wikipedia While the list of important terms fairly represents the content of the categories, these terms can serve as appropriate labels for only a few categories. On the other hand, Wikipedia labels agree with human annotated labels much more.

  6. GENERAL FRAMEWORK FOR CLUSTER LABELING

  7. GENERAL FRAMEWORK FOR CLUSTER LABELING Documents are first parsed and tokenized

  8. GENERAL FRAMEWORK FOR CLUSTER LABELING The clustering algorithms goal is to create coherent clusters for which documents within a cluster share the same topics

  9. GENERAL FRAMEWORK FOR CLUSTER LABELING We now wish to find a list of terms ordered by their estimated importance, to represent the content of the cluster’s documents. Such terms consist of single keywords

  10. GENERAL FRAMEWORK FOR CLUSTER LABELING Wenowwishtoextract candidate labels for cluster C

  11. GENERAL FRAMEWORK FOR CLUSTER LABELING Candidate labels are evaluated by several judges. Theneachjudge evaluates the candidates according to its evaluation policy.

  12. Experiments K: indicates the number of required cluster labels Match@K: The relative number of clusters for which at least one of the top-k labels is correct.

  13. Summary • Wedescribed a general framework for cluster labeling that extracts candidate labels from the text and from Wikipedia • Cluster labeling with Wikipedia is extremely successful, as shown by our results.

  14. THANKS

  15. Enhancing Cluster Labeling Using Wikipedia David Carmel, HaggaiRoitman, NaamaZwerdlingIBM ResearchLab {carmel,haggai,naamaz}@il.ibm.com Presentby Miguel Panuera mpanuera@gmail.com San Pablo CatholicUniversity • School of Computer Science AREQUIPA – PERU 2010

More Related