1 / 35

Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology. Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA. Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers.

osmond
Download Presentation

Mark Hasegawa-Johnson jhasegaw@uiuc University of Illinois at Urbana-Champaign, USA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Landmark-Based Speech Recognition:Spectrogram Reading,Support Vector Machines,Dynamic Bayesian Networks,and Phonology Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA

  2. Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers • Definition: Hyperplane Classifier • Minimum Classification Error Training Methods • Empirical risk • Differentiable estimates of the 0-1 loss function • Error backpropagation • Kernel Methods • Nonparametric expression of a hyperplane • Mathematical properties of a dot product • Kernel-based classifier • The implied high-dimensional space • Error backpropagation for a kernel-based classifier • Useful kernels • Polynomial kernel • RBF kernel

  3. Classifier Terminology

  4. Hyperplane Classifier x Distance=b x x x x x x x x x x x Normal Vector w x x x x x Class Boundary (“Separatrix”): The plane wTx=b x Origin (x=0)

  5. Loss, Risk, and Empirical Risk

  6. Empirical Risk with 0-1 Loss Function = Error Rate on Training Data

  7. Differentiable Approximations of the 0-1 Loss Function: Hinge Loss

  8. Differentiable Approximations of the 0-1 Loss Function: Hinge Loss

  9. Differentiable Empirical Risks

  10. Error Backpropagation: Hyperplane Classifier with Sigmoidal Loss

  11. Sigmoidal Classifier = Hyperplane Classifier with Fuzzy Boundaries x x x x x x x x More Red x x x x x Less Red x x x Less Blue More Blue

  12. Error Backpropagation: Sigmoidal Classifier with Absolute Loss

  13. Sigmoidal Classifier: Signal Flow Diagram Hypothesis h(x) Sigmoid input g(x) + Connection weights w w1 w3 w2 x1 x2 x3 Input x

  14. Multilayer Perceptron Hypothesis h2(x) Sigmoid inputs g2(x) + b21 Connection weights w1 w311 w313 w312 Sigmoid outputs h1(x) Sigmoid inputs g1(x) b11 + + + b12 b13 Connection weights w1 w123 w133 w113 x1 x2 x3 Input h0(x)≡x

  15. Multilayer Perceptron: Classification Equations

  16. Error Backpropagation for a Multilayer Perceptron

  17. Classification Power of a One-Layer Perceptron

  18. Classification Power of a Two-Layer Perceptron

  19. Classification Power of a Three-Layer Perceptron

  20. Output of Multilayer Perceptron is an Approximation of Posterior Probability

  21. Kernel-Based Classifiers

  22. Representation of Hyperplane in terms of Arbitrary Vectors

  23. Kernel-based Classifier

  24. Error Backpropagation for a Kernel-Based Classifier

  25. The Implied High-Dimensional Space

  26. Some Useful Kernels

  27. Polynomial Kernel

  28. Polynomial Kernel: Separatrix (Boundary Between Two Classes) is a Polynomial Surface

  29. Classification Boundaries Available from a Polynomial Kernel(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)

  30. Implied Higher-Dimensional Space has a Dimension of Kd

  31. The Radial Basis Function (RBF) Kernel

  32. RBF Classifier Can Represent Any Classifier Boundary

  33. RBF Classifier Can Represent Any Classifier Boundary(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004) • More training corpus errors • Smoother boundary • Fewer training corpus errors • Wigglier boundary In these figures, C was adjusted, not g, but a similar effect can be achieved by setting N<<M and adjusting g.

  34. If N<M, Gamma can Adjust Boundary Smoothness

  35. Summary • Classifier definitions • Classifier = a function from x into y • Loss = the cost of a mistake • Risk = the expected loss • Empirical Risk = the average loss on training data • Multilayer Perceptrons • Sigmoidal classifier is similar to hyperplane classifier with sigmoidal loss function • Train using error backpropagation • With two hidden layers, can model any boundary (MLP is a “universal approximator”) • MLP output is an estimate of p(y|x) • Kernel Classifiers • Equivalent to: (1) project into f(x), (2) apply hyperplane classifier • Polynomial kernel: separatrix is polynomial surface of order d • RBF kernel: separatrix can be any surface (RBF is also a “universal approximator”) • RBF kernel: if N<M, g can adjust the “wiggliness” of the separatrix

More Related