1 / 25

Linear Models for Classification

Linear Models for Classification. Berkay Topçu. Linear Models for Classification. Goal: Take an input vector and assign it to one of K classes (C k where k =1,...,K) Linear separation of classes. Generalized Linear Models.

dirk
Download Presentation

Linear Models for Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Models for Classification Berkay Topçu

  2. Linear Models for Classification • Goal: Take an input vector and assign it to one of K classes (Ck where k=1,...,K) • Linear separation of classes

  3. Generalized Linear Models • We wish to predict discrete class labels, or more generally class posterior probabilities that lies in range (0,1). • Classification model as a linear function of the parameters , • Classification directly in the original input space , or a fixed nonlinear transformation of the input variables using a vector of basis functions

  4. Discriminant Functions • Linear discriminants • If , assign to class C1and to class C2otherwise • Decision boundary is given by • determines the orientation of the decision surface and determines the location • Compact notation:

  5. Multiple Classes • K-class discriminant by combining number of two-class discriminant functions (K>2) • One-versus-the-rest: seperating points in one particular class Ckfrom points not in that class • One-versus-one: K(K-1)/2 binary discriminant functions

  6. Multiple Classes • A single K-class discriminant comprising K linear functions • Assign to class Ck if for all • How to learn the parameters of linear discriminant functions?

  7. Least Squares for Classification • Each class Ckis described by its own linear model • Training data set for n =1,...,N where • Matrix whose nthrow is the vector and whose nth row is

  8. Least Squares for Classification • Minimizing the sum-of-squares error function • Solution : • Discriminant function :

  9. Fisher’s Linear Discriminant • Dimensionality reduction: take the D-dimensional input vector and project to one dimension using • Projection that maximizes class seperation • Two-class problem: N1points of C1and N2points of C2 • Fisher’s idea: • large separation between the projected class means • small variance within each class, minimizing class overlap

  10. Fisher’s Linear Discriminant • The Fisher criterion:

  11. Fisher’s Linear Discriminant • For the two-class problem, Fisher criterion is a special case of least squares (reference : Penalized Discriminant Analysis – Hastie, Buja and Tibshirani) • For multiple classes: • The weights values are determined by the eigenvectors that corresponds to K highest eigenvalues of

  12. The Perceptron Algorithm • Input vector is transformed using a nonlinear transformation • Perceptron criterion: • For all training samples • We need to minimize

  13. The Perceptron Algorithm – Stocastic Gradient Descent • Cycle through the training patterns in turn • If the pattern is correctly classified weight vectors remains unchanged, else:

  14. Probabilistic Generative Models • Depend on simple assumptions about the distribution of the data • Logistic sigmoid function • Maps the whole real axis to a finite interval

  15. Continuous Inputs - Gaussian • Assuming the class-conditional densities are Gaussian • Case of two classes

  16. Maximum Likelihood Solution • Likelihood function: • Maximizing log-likelihood

  17. Probabilistic Discriminative Models • Probabilistic generative model • Number of parameters grows quadratically with M (# dim.) • However has M adjustable parameters • Maximum likelihood solution for Logistic Regression • Energy function: negative log likelihood

  18. Iterative Reweighted Least Squares • Newton-Raphson iterative optimization on linear regression • Same as the standard least-squares solution

  19. Iterative Reweighted Least Squares • Newton-Raphson update for negative log likelihood • Weighted least-squares problem

  20. Maximum Margin Classifiers • Support Vector Machines for two-class problem • Assuming linearly seperable data set • There exists at least one set of variables satisfies • That give the smallest generalization error • Margin: the smallest distance between decision boundary and any of the samples

  21. Support Vector Machines • Optimization of parameters, maximizing the margin • Maximizing the margin minimizing : • subject to the constraint: • Introduction of Lagrange multipliers

  22. Support Vector Machines - Lagrange Multipliers • Minimizing with respect to w and b and maximizing with respect to a. • The dual form: • Quadratic programming problem:

  23. Support Vector Machines • Overlapping class distributions (linearly unseparable data) • Slack variable: distance from the boundary • To maximize the margin while penalizing points that lie on the wrong side of the margin boundary

  24. SVM-Overlapping Class Distributions • Identical to separable case • Again represents a quadratic programming problem

  25. Support Vector Machines • Relation to logistic regression • Hinge loss used in SVM and the error function of logistic regression approximate the ideal misclassification error(MCE) • Black : MCE, Blue: Hinge Loss, Red: Logistic Regression, Green: Squared Error

More Related