1 / 20

Mining Positive and Negative Patterns for Relevance Feature Discovery

Mining Positive and Negative Patterns for Relevance Feature Discovery. Presenter : Cheng- Hui Chen Author : Yuefeng Li, Abdulmohsen Algarni , Ning Zhong KDD 2010. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

soyala
Download Presentation

Mining Positive and Negative Patterns for Relevance Feature Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Positive and Negative Patterns for Relevance Feature Discovery Presenter: Cheng-Hui Chen Author: Yuefeng Li, AbdulmohsenAlgarni, NingZhong KDD 2010

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. Many text mining only consider term’s distributions.

  4. Objectives The innovative technique presented in paper makes a breakthrough for this difficulty. To purpose consider both term’s distributions and their specificities when we use them for text mining and classification.

  5. Methodology Frequency weight New weight Specificity Weight

  6. Definitions • Frequent pattern • Absolute support: • Relative support : • A termset X is called, if supa (or supr) >=min_sup • Closed pattern • Cls (X) = termset (coverset (X)) • A termset X is called, if and only if X = Cls (X) • , for all pattern X1 X • Closed sequential pattern

  7. The deploying method • To improve the efficiency of the pattern taxonomy mining (PTM), an algorithm, SPMining(D+; min_sup). • For a given term t, its support (or called weight) indiscovered patterns can be described as follow: • the following rank will be assigned to every incoming documentd to decide its relevance.

  8. Mining Algorithms

  9. Specificity of low-level features • We define the specificity of a given term t in the training set D = D+ ∪D-as follows:

  10. Revision of discovered features • Revision of discovered Features

  11. Revision Algorithms

  12. Experiments • Data • This research uses Reuters Corpus Volume1 (RCV1) and the 50 assessor topics to evaluate the proposed model. • Compare • The up-to date pattern mining • The well-known term-based method

  13. Experiments • The well-known term-based methods • The Rocchio model • BM25 • SVM

  14. Experiments

  15. Experiments

  16. Experiments

  17. Experiments

  18. Experiments

  19. Conclusions Compared with the state-of-the-art models, the experiments on RCV1 and TREC topics demonstrate that the effectiveness of relevance feature discovery can be significantly improved by the proposed approach. This paper recommends to classify low-level terms into three categories in order to largely improve the performance of the revision.

  20. Comments • Advantages • The effectiveness of relevance feature discovery can be significantly improved by the proposed approach. • Drawback • … • Applications • Text mining • Classification

More Related