1 / 30

Logistic Regression

Logistic Regression. Linear regression. Function f : X Y is a linear combination of input components. Binary classification. Two classes Our goal is to learn to classify correctly two types of examples Class 0 – labeled as 0, Class 1 – labeled as 1 We would like to learn

korbin
Download Presentation

Logistic Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logistic Regression

  2. Linear regression • Function f : XYis a linear combination of input components

  3. Binary classification • Two classes • Our goal is to learn to classify correctly two types of examples • Class 0 – labeled as 0, • Class 1 – labeled as 1 • We would like to learn • Zero-one error (loss) function • Error we would like to minimize: • •First step: we need to devise a model of the function

  4. Discriminant functions • One convenient way to represent classifiers is through • Discriminant functions • Works for binary and multi-way classification • Idea: • For every class i = 0,1, …k define a function gi(x) mapping X R • When the decision on input x should be made choose the class with the highest value of gi(x) • So what happens with the input space? Assume a binary case.

  5. Discriminant functions

  6. Discriminant functions • Define decision boundary.

  7. Quadratic decision boundary

  8. Logistic regression model • Defines a linear decision boundary • Discriminant functions:

  9. Logistic function

  10. Linear decision boundary • Logistic regression model defines a linear decision boundary • Why? • Answer: Compare two discriminant functions.

  11. Logistic regression model. Decision boundary

  12. Form of P(Yj|X) for Gaussian Naive Bayes Classifier • a GNB based on the following modeling assumptions:

  13. In general, Bayes rule allows us to write • Dividing both the numerator and denominator by the numerator yields:

  14. Because of our conditional independence assumption we can write this

  15. Given our assumption that P(Xi|Y =yk) is Gaussian, we can expand this term as follows:

  16. Estimating Parameters for Logistic Regression • training Logistic Regression is to choose parameter values that maximize the conditional data likelihood. • The conditional data likelihood is the probability of the observed Y values in the training data, conditioned on their corresponding X values. We choose parametersW that satisfy Equivalently, we can work with the log of the conditional likelihood:

  17. This conditional data log likelihood, which we will denote l (W) can be written as

  18. Using gradient ascent, The i th component of the vector gradient has the form where ˆP(Yl jXl ;W) is the Logistic Regression prediction using following equations

  19. Regularization in Logistic Regression • Overfitting the training data is a problem that can arise in Logistic Regression, especially when data is very high dimensional and training data is sparse. • One approach to reducing overfitting is regularization, in which we create a modified “penalized log likelihood function,” which penalizes large values of W.

  20. Logistic Regression for Functions with Many Discrete Values • More generally, if Y can take on any of the discrete values {y1,….,yK}, then the form of • is

  21. Relationship Between Naive Bayes Classifiers and Logistic Regression • Logistic Regression directly estimates the parameters of P(Y|X) • Naive Bayes directly estimates parameters for P(Y) and P(Y|X), • We often call the former a discriminative classifier, and the latter a generative classifier.

  22. Generative vs. Discriminative Classifiers Wish to learn f: X Y, or P(Y|X) Generative classifiers (e.g., Naïve Bayes): • Assume some functional form for P(X|Y), P(Y) • This is the ‘generative’ model • Estimate parameters of P(X|Y), P(Y) directly from training data • Use Bayes rule to calculate P(Y|X= xi) Discriminative classifiers: • Assume some functional form for P(Y|X) • This is the ‘discriminative’ model • Estimate parameters of P(Y|X) directly from training data

  23. Naïve Bayes vs Logistic Regression Consider Y boolean, Xi continuous, X=<X1 ... Xn> Number of parameters to estimate: • NB:4n+1 • LR:n+1

  24. G.Naïve Bayes vs. Logistic Regression • Generative and Discriminative classifiers • Asymptotic comparison (# training examples 􀃆 infinity) • when model assumptions correct • GNB, LR produce identical classifiers • when model assumptions incorrect • LR is less biased – does not assume cond indep. • therefore expected to outperform GNB

  25. Naïve Bayes vs. Logistic Regression • Generative and Discriminative classifiers • Non-asymptotic analysis (see [Ng & Jordan, 2002] ) • convergence rate of parameter estimates – how many training examples needed to assure good estimates? • GNB order log n (where n = # of attributes in X) • LR order n GNB converges more quickly to its (perhaps less helpful) asymptotic estimates

More Related