1 / 18

Improving Text Classification by Shrinkage in a Hierarchy of Classes

Improving Text Classification by Shrinkage in a Hierarchy of Classes. Roni Rosenfeld CMU Andrew Y. Ng MIT AI Lab. Andrew McCallum Just Research & CMU Tom Mitchell CMU. “grow corn tractor…”. The Task: Document Classification

Download Presentation

Improving Text Classification by Shrinkage in a Hierarchy of Classes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Improving Text Classification by Shrinkage in aHierarchy of Classes Roni Rosenfeld CMU Andrew Y. Ng MIT AI Lab Andrew McCallum Just Research & CMU Tom Mitchell CMU

  2. “grow corn tractor…” The Task: Document Classification (also “Document Categorization”, “Routing” or “Tagging”) Automatically placing documents in their correct categories. Testing Data: (Crops) Categories: Crops Botany Evolution Magnetism Relativity Irrigation Training Data: water grating ditch farm tractor... corn wheat silo farm grow... corn tulips splicing grow... selection mutation Darwin Galapagos DNA... ... ...

  3. “corn grow tractor…” The Idea: “Shrinkage” / “Deleted Interpolation” We can improve the parameter estimates in a leaf by averaging them with the estimates in its ancestors. (Crops) Testing Data: Science Agriculture Biology Physics Categories: Crops Botany Evolution Magnetism Relativity Irrigation Training Data: water grating ditch farm tractor... corn wheat silo farm grow... corn tulips splicing grow... selection mutation Darwin Galapagos DNA... ... ...

  4. A Probabilistic Approach toDocument Classification Naïve Bayes wherecjis a class, d is a document, wdi is the ith word of document d Maximum a posteriori estimate of Pr(w|c), with a Dirichlet prior, a=1 (AKA Laplace smoothing) whereN(w,d)is number of times word w occurs in document d.

  5. “Shrinkage” / “Deleted Interpolation” [James and Stein, 1961] / [Jelinek and Mercer, 1980] (Uniform) Science Agriculture Biology Physics Crops Botany Evolution Magnetism Relativity Irrigation

  6. Learning Mixture Weights Learn the l’s via EM, performing the E-step with leave-one-out cross-validation. Uniform E-step Use the current l’s to estimate the degree to which each node was likely to have generated the words in held out documents. Science Agriculture M-step Use the estimates to recalculate new values for the l’s. Crops corn wheat silo farm grow...

  7. Learning Mixture Weights E-step M-step

  8. Newsgroups Data Set (Subset of Ken Lang’s 20 Newsgroups set) computers religion sport politics motor mac atheism misc guns misc ibm X baseball auto hockey mideast motorcycle graphics christian windows • 15 classes, 15k documents,1.7 million words, 52k vocabulary

  9. Newsgroups HierarchyMixture Weights

  10. Newsgroups HierarchyMixture Weights • 235 training documents • (15/class) • 7497 training documents • (~500/class)

  11. Industry Sector Data Set www.marketguide.com … (11) transportation utilities consumer energy services ... ... ... water electric gas coal integrated air misc appliance film furniture communication railroad water trucking oil&gas • 71 classes, 6.5k documents,1.2 million words, 30k vocabulary

  12. Industry Sector Classification Accuracy

  13. Newsgroups Classification Accuracy

  14. Yahoo Science Data Set www.yahoo.com/Science … (30) agriculture biology physics CS space ... ... ... ... ... dairy botany cell AI courses crops craft magnetism HCI missions agronomy evolution forestry relativity • 264 classes, 14k documents,3 million words, 76k vocabulary

  15. Yahoo Science Classification Accuracy

  16. Related Work • Shrinkage in Statistics: • [Stein 1955], [James & Stein 1961] • Deleted Interpolation in Language Modeling: • [Jelinek & Mercer 1980], [Seymore & Rosenfeld 1997] • Bayesian Hierarchical Modeling for n-grams • [MacKay & Peto 1994] • Class hierarchies for text classification • [Koller & Sahami 1997] • Using EM to set mixture weights in a hierarchical clustering model for unsupervised learning • [Hofmann & Puzicha 1998]

  17. Conclusions • Shrinkage in a hierarchy of classes can dramatically improve classification accuracy (29%) • Shrinkage helps especially when training data is sparse. In models more complex than naïve Bayes, it should be even more helpful. • [The hierarchy can be pruned for exponential reduction in computation necessary for classification; only minimal loss of accuracy.]

  18. Future Work • Learning hierarchies that aid classification. • Using more complex generative models. • Capturing word dependancies • Clustering words in each ancestor

More Related