1 / 22

Meta-learning for automatic selection of algorithms for text classification

Karol Furd ík, Ján Paralič, Gabriel Tutoky Karol.Furdik@intersoft.sk, {Jan.Paralic, Gabriel.Tutoky}@ tuke.sk Technical University of Košice , Slovakia. Meta-learning for automatic selection of algorithms for text classification. September 24-26, 2008 University of Zagreb, Varaždin, Croatia.

zion
Download Presentation

Meta-learning for automatic selection of algorithms for text classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Karol Furdík, Ján Paralič, Gabriel Tutoky Karol.Furdik@intersoft.sk, {Jan.Paralic, Gabriel.Tutoky}@ tuke.sk Technical University of Košice, Slovakia Meta-learning for automatic selection of algorithmsfor text classification September 24-26, 2008 University of Zagreb, Varaždin, Croatia

  2. Introduction • Text classification • Method for knowledge extraction from textual documents • Originally, the classification was designed as a semi-automatic procedure, where the users were responsible for selection of proper classification settings • In the most of applications (e.g. in KP-Lab project (http://www.kp-lab.org)) is requirement for fully automated text classification • Meta-Learning • Allows to automatize text classification process by automatic selection of the proper algorithms K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  3. Theoretical analyses

  4. Text classification– two steps process Creation of the classifier Training set of documents Preprocessing of documents Learning of Classifier Classifier Usage of the classifier Document of unknown category Preprocessing of current document Classifier application Categorized document K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  5. Meta-learning, MUDOF algorithm • MUDOF – Meta-learning Using Document Feature Characteristics • Introduced in 2002 by Wai and Kwok-Yin • Meta-learning targets: • Selection of algorithms for classifiers • Selection of algorithms is on category level (for each category is possible to select other algorithm) • Automatize and optimalize the classifiers creation process K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  6. Meta-learning– scheme(1/4) Construction of the meta-model Training set for creation of the meta-model (TM) Values of effectiveness Testing set of documents (TE) Meta-model Usage of the meta-model Classifier K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  7. Values of effectiveness • The A1, ... Analgorithms are “one by one” applicated on C1, ... Cmcategories from training set • The nxm binary classifiers are created • Evaluation of binary classifiers by testing data collection • Efficiency of each algorithm on each category is obtained • The most computational step in the meta-learning K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  8. Meta-learning– scheme(2/4) Construction of the meta-model Training set for creation of the meta-model (TM) Values of effectiveness Testing set of documents (TE) Feature characteristics of particular categories Usage of the meta-model K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  9. Feature characteristics • The categories are characterized by statistical view • Examples of characteristics: • PosTr – ratio of positive and negative instances • AvgDocLen – average document length • AvgTermVal – average term weight • AvgTopInfoGain – average info gain of best m terms • NumInfoGainThres – numbers of terms over threshold value of info gain K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  10. Meta-learning– scheme(3/4) Construction of the meta-model Training set for creation of the meta-model (TM) Values of effectiveness Testing set of documents (TE) Feature characteristics of particular categories Meta-model Usage of the meta-model K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  11. Meta-model • Modeling relations between feature characteristics of categories and efficiency of algorithms • Meta-model can be: • Prediction (MUDOF_R) – linear regression • Classification (MUDOF_K)– k-NN • Meta-model advantages: • “Engine” for selection of proper algorithms • Possible to use it for more than one collection of documents • In the ideal case, it is sufficient to learn a meta-model only once and then it can be used for selection of algorithms K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  12. Meta-learning– scheme(4/4) Construction of the meta-model Training set for creation of the meta-model (TM) Values of effectiveness Testing set of documents (TE) Feature characteristics of particular categories Meta-model Usage of the meta-model Training set for creation of the classifier (TC) Feature characteristics of particular categories Meta-model Selection of algorithms for particular categories Learning of classifiers Classifier K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  13. Experiments

  14. Data description • Reuters-21578 • 10 788 documents; 90 categories • TM (3815); TC (3961); TE (3019) • Not balanced data • 20 Newsgroups • 19 997 documents; 20 categories • TC (10 025); TE (9972) • Well balanced data K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  15. Experiment 1 (1/3) • Testing of the meta-learning approach on single data set(Reuters text collection) • Assumes – training set is divided on: • Training set for creation of the meta-model (TM) • Training set for creation of the classifier (TC) • Target: • Increase of effectiveness of the final classifier in comparison with the base classifiers K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  16. Experiment 1 (2/3) • Classifier effectiveness – with F1 optimized measure K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  17. Experiment 1 (3/3) • Selection of algorithms – over AVERAGE K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  18. Experiment 2 (1/3) • Test the usability of the meta-learning approach on two different sets of documents (Reuters & 20Newsgroups) • Assumes: • Training set of one data collection is used for creation of the meta-model • Training set of other data collection is used for creation of the classifier • Targets: • Full automatically selection of algorithms without re-learning of meta-model (meta-model learned on other data collection is used) • Better effectiveness of classifier K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  19. Experiment 2 (2/3) • Classifier effectiveness – with F1 optimized measure K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  20. Experiment 2 (3/3) • Selection of algorithms – over AVERAGE K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  21. Conclusion • Advantages of meta-learning • Full automated text categorization – selection of algorithms is automatic • Increasing of effectiveness of the final classifier (on one data collection) • Usability of one meta-model for various data collection • Disadvantages of meta-learning • Is needed big computing and time capacity K. Furdik, J. Paralič, G. Tutoky: Meta-learning for automatic selection of algorithms for text classification CECIIS 2008, University of Zagreb, Varaždin, Croatia, September 24-26, 2008

  22. Thank you for your attention

More Related