Clustering tagged documents with labeled and unlabeled documents - PowerPoint PPT Presentation

kosey
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Clustering tagged documents with labeled and unlabeled documents PowerPoint Presentation
Download Presentation
Clustering tagged documents with labeled and unlabeled documents

play fullscreen
1 / 12
Download Presentation
Clustering tagged documents with labeled and unlabeled documents
168 Views
Download Presentation

Clustering tagged documents with labeled and unlabeled documents

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Clustering tagged documents with labeled and unlabeled documents Presenter : Jian-Ren ChenAuthors : Chien-Liang Liu*, Wen-Hoar Hsaio, Chia-Hoang Lee, Chun-Hsien Chen2013 , IPM

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation Tagscan provide semantic information about the resources and they can help machines perform the classification or clustering tasks accurately. Probabilistic latent semantic analysis (PLSA) - aspect model - statistical clustering model

  4. Objectives • This study employs Constrained-PLSA to cluster tagged documents with a small amount of seeds. • The Constrained-PLSA is based on statistical clustering model rather than aspect model.

  5. Methodology - PLSA E-step M-step Terms (keywords) of the document collection documents

  6. Methodology - Constrained-PLSA E-step M-step

  7. Experiments -Data set A (CiteULike)

  8. Experiments (Data set A)

  9. Experiments -Data set B (CiteULike)

  10. Experiments (Data set B)

  11. Conclusions • The performance of ‘‘tags as words’’ representation scheme is more stable than ‘‘words + tags’’ representation scheme. • Unsupervised learning methods fail to function properly in the data set with noisy information, but Constrained-PLSA function properly and stable even though only a small amount of labeled data is available.

  12. Comments • Advantages - Constrained-PLSA outperforms the other methods • Disadvantage - too much artificial processing in experiment • Applications • text mining • tagged document clustering