1 / 23

Linear Models (I)

Linear Models (I). Rong Jin. Review of Information Theory. What is information? What is entropy? Average information Minimum coding length Important inequality. Distribution for Generating Symbols. Distribution for Coding Symbols. Review of Information Theory (cont’d).

shad-adams
Download Presentation

Linear Models (I)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Models (I) Rong Jin

  2. Review of Information Theory • What is information? • What is entropy? • Average information • Minimum coding length • Important inequality Distribution for Generating Symbols Distribution for Coding Symbols

  3. Review of Information Theory (cont’d) • Mutual information • Measure the correlation between two random variables • Symmetric • Kullback-Leibler distance • Difference between two distributions

  4. Outline • Classification problems • Information theory for text classification • Gaussian generative • Naïve Bayes • Logistic regression

  5. X Input Y Output ? Classification Problems • Given input X={x1, x2, …, xm} • Predict the class label y • y{-1,1}, binary class classification problems • y {1, 2, 3, …, c}, multiple class classification problems • Goal: need to learn the function:

  6. Doc: Months of campaigning and weeks of round-the-clock efforts in Iowa all came down to a final push Sunday, … Topic: politics    Which is a bird image? Examples of Classification Problems • Text categorization: • Input features: words ‘campaigning’, ‘efforts’, ‘Iowa’, ‘Democrats’, … • Class label: ‘politics’ and ‘non-politics’ • Image Classification: • Input features: color histogram, texture distribution, edge distribution, … • Class label: ‘bird image’ and ‘non-bird image’

  7. Learning Setup for Classification Problems • Training examples: • Identical Independent Distribution (i.i.d.) • Training examples are similar to testing examples • Goal • Find a model or a function that is consistent with the training data

  8. Information Theory for Text Classification • If coding distribution is similar to the generating distribution  short coding length  good compression rate Distribution for Generating Symbols Distribution for Coding Symbols

  9. Compression Algorithm for TC Topic: Sports New Document Compression Model M1 Politics 16K bits Compression Model M2 10K bits Sports

  10. Training Examples Learning a Statistical Model  Prediction p(y|x;) Probabilistic Models for Classification Problems • Apply statistical inference methods • Key: finding the best parameters  • Maximum likelihood (MLE) approach • Log-likelihood of data • Find the parameters  that maximizes the log-likelihood

  11. Generative Models • Not directly estimate p(y|x;) • Using Bayes rule • Estimate p(xly;) instead of p(y|x;) • Why p(xly;)? • Most well known distributions are p(xl). • Allocate a separate set of parameters for each class •   {1, 2,…,c} • p(xly;)  p(xly) • Describes the special input patterns for each class y

  12. Gaussian Generative Model (I) • Assume a Gaussian model for each class • One dimension case • Results for MLE

  13. Example • Height histogram for males and females. • Using Gaussian generative model • P(male|1.8) = ? , P(female|1.4) = ?

  14. Gaussian Generative Model (II) • Consider multiple input features • X={x1, x2, …, xm} • Multi-variate Gaussian distribution • y is a mm covariance matrix • Results for MLE • Problem: • Singularity of y : too many parameters

  15. Overfitting Issue • Complex model • Insufficient training • Consider a classification problem of multiple inputs • 100 input features • 5 classes • 1000 training examples • Total number parameters for a full Gaussian model is • 5 means  500 parameters • 5 covariance matrices  50,000 parameters • 50,500 parameters  insufficient training data

  16. Another Example of Overfitting

  17. Another Example of Overfitting

  18. Another Example of Overfitting

  19. Another Example of Overfitting

  20. Naïve Bayes • Simplify the model complexity • Diagonalize the covariance matrix y • Simplified Gaussian distribution • Feature independence assumption • Naïve Bayes assumption

  21. Naïve Bayes • A terrible estimator for • But it is a very reasonable estimator for Why? • The ratio of likelihood is more important • Naïve Bayes does a reasonable job on the estimation of ratio

  22. The Ratio of Likelihood • Binary class • Both classes share the similar variance • A linear model !

  23. Decision Boundary • Gaussian Generative Models == Finding a linear decision boundary • Why not do it directly?

More Related