Integrated Instance- and Class-based Generative Modeling for Text Classification - PowerPoint PPT Presentation

vesna
integrated instance and class based generative modeling for text classification n.
Skip this Video
Loading SlideShow in 5 Seconds..
Integrated Instance- and Class-based Generative Modeling for Text Classification PowerPoint Presentation
Download Presentation
Integrated Instance- and Class-based Generative Modeling for Text Classification

play fullscreen
1 / 30
Download Presentation
Integrated Instance- and Class-based Generative Modeling for Text Classification
121 Views
Download Presentation

Integrated Instance- and Class-based Generative Modeling for Text Classification

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Integrated Instance- and Class-based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-HyonMyaeng KAIST 5/12/2013 Australasian Document Computing Symposium

  2. Instance vs. Class-based Text Classification • Class-based learning • Multinomial Naive Bayes, Logistic Regression, Support Vector Machines, … • Pros: compact models, efficient inference, accurate with text data • Cons: document-level information discarded • Instance-based learning • K-Nearest Neighbors, Kernel Density Classifiers, … • Pros: document-level information preserved, efficient learning • Cons: data sparsityreduces accuracy

  3. Instance vs. Class-based Text Classification 2 • Proposal: Tied Document Mixture • integrated instance- and class-based model • retains benefits from both types of modeling • exact linear time algorithms for estimation and inference • Main ideas: • replace Multinomial class-conditional in MNB with a mixture over documents • smooth document models hierarchically with class and background models

  4. Multinomial Naive Bayes • Standard generative model for text classification • Result of simple generative assumptions • Bayes • Naive • Multinomial

  5. Multinomial Naive Bayes 2

  6. Tied Document Mixture • Replace Multinomial in MNB by a mixture over all documents • , where documents models are smoothed hierarchically • , where class models are estimated by averaging the documents

  7. Tied Document Mixture 2

  8. Tied Document Mixture 3 • Can be described as constraints on a two-level mixture • Document level mixture: • Number of components= • Components assigned to instances • Component weights= • Word level mixture: • Number of components= (hierarchy depth) • Components assigned to hierarchy • Component weights= , , and

  9. Tied Document Mixture 3 • Can be described as constraints on a two-level mixture • Document level mixture: • Number of components= • Components assigned to instances • Component weights= • Word level mixture: • Number of components= (hierarchy depth) • Components assigned to hierarchy • Component weights= , , and

  10. Tied Document Mixture 3 • Can be described as constraints on a two-level mixture • Document level mixture: • Number of components= • Components assigned to instances • Component weights= • Word level mixture: • Number of components= (hierarchy depth) • Components assigned to hierarchy • Component weights= , , and

  11. Tied Document Mixture 4 • Can be described as a class-smoothed Kernel Density Classifier • Document mixture equivalent to a Multinomial kernel density • Hierarchical smoothing corresponds to mean shift or data sharpening with class-centroids

  12. Hierarchical Sparse Inference • Reduces complexity from to • Same complexity as K-Nearest Neighbors based on inverted indices (Yang, 1994)

  13. Hierarchical Sparse Inference 2 • Precompile values: • decomposes: • Store and in inverted indices

  14. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  15. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  16. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  17. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  18. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  19. Hierarchical Sparse Inference 3 • Compute first • Update by • Update by to get • Compute • Bayes rule

  20. Hierarchical Sparse Inference 2 • Compute first • Update by • Update by to get • Compute • Bayes rule

  21. Experimental Setup • 14 classification datasets used: • 3 spam classification • 3 sentiment analysis • 5 multi-class classification • 3 multi-label classification • Scripts and datasets in LIBSVM format: • http://sourceforge.net/projects/sgmweka/

  22. Experimental Setup 2 • Classifiers compared: • Multinomial Naive Bayes (MNB) • Tied Document Mixture (TDM) • K-Nearest Neighbors(KNN) (Multinomial distance, distance-weighted vote) • Kernel Density Classifier (KDC) (Smoothed multinomial kernel) • Logistic Regression (LR, LR+) (L2-regularized) • Support Vector Machine (SVM, SVM+) (L2-regularized L2-loss) • LR+ and SVM+ weighted feature vectors by TFIDF • Smoothing parameters optimized for MicroFscore on held-out development sets using Gaussian Random Searches

  23. Results • Training times for MNB, TDM, KNN and KDC linear • At most 70 s for MNB on for OHSU-TREC, 170 s for the others • SVM and LR require iterative algorithms • At most 936 s, for LR on Amazon12 • Did not scale to multi-label datasets in practical times • Classification times for instance-based classifiers higher • At most mean 226 ms for TDM on OHSU-TREC, compared to 70 ms for MNB • (with 290k terms, 196k labels, 197k documents)

  24. Results 2 TDM significantly improves on MNB, KNN and KDC Across comparable datasets, TDM is on par with SVM+ • SVM+ is significantly better on multi-class datasets • TDM is significantly better on spam classification

  25. Results 2 TDM significantly improves on MNB, KNN and KDC Across comparable datasets, TDM is on par with SVM+ • SVM+ is significantly better on multi-class datasets • TDM is significantly better on spam classification

  26. Results 3 TDM reduces classification errors compared to MNB by: >65% in spam classification >26% in sentiment analysis Some correlation between error reduction and number of instances/class. Task types form clearly separate clusters

  27. Conclusion • Tied Document Mixture • Integrated instance- and class-based model for text classification • Exact linear time algorithms, with same complexities as KNN and KDC • Accuracy substantially improved over MNB, KNN and KDC • Competitive with optimized SVM, depending on task type • Many improvements to the basic model possible • Sparse inference scales to hierarchical mixtures of >340k components • Toolkit, datasets and scripts available: • http://sourceforge.net/projects/sgmweka/

  28. Sparse Inference • Sparse Inference (Puurula, 2012) • Use inverted indices to reduce complexity of computing joint for a given • Instead of computing as dot products, compute and update by for each from the inverted index • Reduces joint inference time complexity from dense to

  29. Sparse Inference 2 • Dense representation: = number of features Time complexity: = number of classes

  30. Sparse Inference 3 • Sparse representation: Time complexity: