Understanding Logistic Regression: A Comprehensive Guide

Logistic Regression

Linear regression • Function f : XYis a linear combination of input components

Binary classification • Two classes • Our goal is to learn to classify correctly two types of examples • Class 0 – labeled as 0, • Class 1 – labeled as 1 • We would like to learn • Zero-one error (loss) function • Error we would like to minimize: • •First step: we need to devise a model of the function

Discriminant functions • One convenient way to represent classifiers is through • Discriminant functions • Works for binary and multi-way classification • Idea: • For every class i = 0,1, …k define a function gi(x) mapping X R • When the decision on input x should be made choose the class with the highest value of gi(x) • So what happens with the input space? Assume a binary case.

Discriminant functions

Discriminant functions • Define decision boundary.

Quadratic decision boundary

Logistic regression model • Defines a linear decision boundary • Discriminant functions:

Logistic function

Linear decision boundary • Logistic regression model defines a linear decision boundary • Why? • Answer: Compare two discriminant functions.

Logistic regression model. Decision boundary

Form of P(Yj|X) for Gaussian Naive Bayes Classifier • a GNB based on the following modeling assumptions:

In general, Bayes rule allows us to write • Dividing both the numerator and denominator by the numerator yields:

Because of our conditional independence assumption we can write this

Given our assumption that P(Xi|Y =yk) is Gaussian, we can expand this term as follows:

Estimating Parameters for Logistic Regression • training Logistic Regression is to choose parameter values that maximize the conditional data likelihood. • The conditional data likelihood is the probability of the observed Y values in the training data, conditioned on their corresponding X values. We choose parametersW that satisfy Equivalently, we can work with the log of the conditional likelihood:

This conditional data log likelihood, which we will denote l (W) can be written as

Using gradient ascent, The i th component of the vector gradient has the form where ˆP(Yl jXl ;W) is the Logistic Regression prediction using following equations

Regularization in Logistic Regression • Overfitting the training data is a problem that can arise in Logistic Regression, especially when data is very high dimensional and training data is sparse. • One approach to reducing overfitting is regularization, in which we create a modified “penalized log likelihood function,” which penalizes large values of W.

Logistic Regression for Functions with Many Discrete Values • More generally, if Y can take on any of the discrete values {y1,….,yK}, then the form of • is

Relationship Between Naive Bayes Classifiers and Logistic Regression • Logistic Regression directly estimates the parameters of P(Y|X) • Naive Bayes directly estimates parameters for P(Y) and P(Y|X), • We often call the former a discriminative classifier, and the latter a generative classifier.

Generative vs. Discriminative Classifiers Wish to learn f: X Y, or P(Y|X) Generative classifiers (e.g., Naïve Bayes): • Assume some functional form for P(X|Y), P(Y) • This is the ‘generative’ model • Estimate parameters of P(X|Y), P(Y) directly from training data • Use Bayes rule to calculate P(Y|X= xi) Discriminative classifiers: • Assume some functional form for P(Y|X) • This is the ‘discriminative’ model • Estimate parameters of P(Y|X) directly from training data

Naïve Bayes vs Logistic Regression Consider Y boolean, Xi continuous, X=<X1 ... Xn> Number of parameters to estimate: • NB:4n+1 • LR:n+1

G.Naïve Bayes vs. Logistic Regression • Generative and Discriminative classifiers • Asymptotic comparison (# training examples 􀃆 infinity) • when model assumptions correct • GNB, LR produce identical classifiers • when model assumptions incorrect • LR is less biased – does not assume cond indep. • therefore expected to outperform GNB

Naïve Bayes vs. Logistic Regression • Generative and Discriminative classifiers • Non-asymptotic analysis (see [Ng & Jordan, 2002] ) • convergence rate of parameter estimates – how many training examples needed to assure good estimates? • GNB order log n (where n = # of attributes in X) • LR order n GNB converges more quickly to its (perhaps less helpful) asymptotic estimates

Understanding Logistic Regression: A Comprehensive Guide

Understanding Logistic Regression: A Comprehensive Guide

Presentation Transcript

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic Regression

Logistic regression

Logistic Regression

Logistic Regression

Logistic Regression