1 / 35

This week: overview on pattern recognition (related to machine learning)

This week: overview on pattern recognition (related to machine learning). Non-review of chapters 6/7. Z-transforms Convolution Sampling/aliasing Linear difference equations Resonances FIR/IIR filtering DFT/FFT. Speech Pattern Recognition.

raisie
Download Presentation

This week: overview on pattern recognition (related to machine learning)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This week: overview on pattern recognition (related to machine learning)

  2. Non-review of chapters 6/7 • Z-transforms • Convolution • Sampling/aliasing • Linear difference equations • Resonances • FIR/IIR filtering • DFT/FFT

  3. Speech Pattern Recognition • Soft pattern classification plus temporal sequenceintegration • Supervised pattern classification: class labels used in training • Unsupervised pattern classification: labels not available (or at least not used)

  4. Training and testing • Training: learning parameters of classifier • Testing: classify independent test set, compare with labels, and score

  5. F1 and F2 for various vowels

  6. Swedish basketball players vs.speech recognition researchers

  7. Feature extraction criteria • Class discrimination • Generalization • Parsimony (efficiency)

  8. Feature vector size • Best representations for discrimination on training set are highly dimensioned • Best representations for generalization to test set tend to be succinct

  9. Dimensionality reduction • Principal components (i.e., SVD, KL transform, eigenanalysis ...) • Linear Discriminant Analysis (LDA) • Application-specific knowledge • Feature Selection via PR Evaluation

  10. PR Methods • Minimum distance • Discriminant Functions • Linear • Nonlinear(e.g., quadratic, neural networks) • Some aspects of each - SVMs • Statistical Discriminant Functions

  11. Minimum Distance • Vector or matrix representing element • Define distance function • Collect examples for each class • In testing, choose the class of the closest example • Choice of distance equivalent to an implicit statistical assumption • Signals (e.g. speech) add temporal variability

  12. Limitations • Variable scale of dimensions • Variable importance of dimensions • For high dimensions, sparsely sampled space • For tough problems, resource limitations (storage, computation, memory access)

  13. Decision Rule for Min Distance • Nearest Neighbor (NN) - in the limit of infinite samples, at most twice the error of optimum classifier • k-Nearest Neighbor (kNN) • lots of storage for large problems; potentially largesearches

  14. Some Opinions • Better to throw away bad data than to reduce itsweight • Dimensionality-reduction based on variance often abad choice for supervised pattern recognition • Both of these are only true sometimes

  15. Discriminant Analysis • Discriminant functions max for correct class, min for others • Decision surface between classes • Linear decision surface for 2-dim is aline, for 3 is a plane; generally called hyperplane • For 2 classes, surface at wTx + w0 = 0 • 2-class quadratic case, surface at xTWx+ wTx + w0 = 0

  16. Two prototype example • Di2 = (x – zi)T (x – zi) = xTx + ziTzi– 2 xTzi • D12 – D22 = 2 xTz2 – 2 xTz1 + z1Tz1– z2Tz2 • At decision surface, distances are equal, so • 2 xTz2 – 2 xTz1= z2Tz2 – z1Tz1 Or xT(z2 – z1) = ½ (z2Tz2 – z1Tz1) And if prototypes are all normalized to 1, Decision surface is • xT(z2 – z1) = 0 • And each discriminant function is xTzi

  17. System to discriminate between two classes

  18. Training Discriminant Functions • Minimum distance • Fisher linear discriminant • Gradient learning

  19. Generalized Discriminators - ANNs • McCulloch Pitts neural model • Rosenblatt Perceptron • Multilayer Systems

  20. Typical unit for MLP Σ

  21. Typical MLP

  22. Support Vector Machines (SVMs) • High dimensional feature vectors • Transformed from simple features (e.g., polynomial) • Can potentially classify training set arbitrarily well • Improve generalization by maximizing margin • Via “Kernel trick”, don’t need explicit high dim • Inner product function between 2 points in space • Slack variables allow for imperfect classification

  23. Maximum margin decision boundary, SVM

  24. Unsupervised clustering • Large and diverse literature • Many methods • Next time one method explained in the context of a larger, statistical system

  25. Some PR/machine learning Issues • Testing on the training set • Training on the test set • # parameters vs # training examples: overfittingand overtraining • For much more on machine learning, CS 281A/B

More Related