1 / 86

When Signal Processing Meets Machine Learning

When Signal Processing Meets Machine Learning. Yu-Chiang Wang 王鈺強 , PhD Candidate Electrical & Computer Engineering Carnegie Mellon University. March 2009. Pittsburgh, USA. Home of the Pittsburgh Pirates. And the Pittsburgh Steelers!. Who Are They?. Signal Processing

mpoulin
Download Presentation

When Signal Processing Meets Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. When Signal Processing Meets Machine Learning Yu-Chiang Wang 王鈺強, PhD Candidate Electrical & Computer Engineering Carnegie Mellon University March 2009

  2. Pittsburgh, USA

  3. Home of the Pittsburgh Pirates

  4. And the Pittsburgh Steelers!

  5. Who Are They? • Signal Processing “Signal processing is the analysis, interpretation, and manipulation of signals…” - Wikipedia • Machine Learning / Pattern Recognition The computer automatically improves “TPE”… – a taskT – according to a performance metricP – through experienceE - Prof. Tom Mitchell @ CMU

  6. Where They Meet? • IEEE Signal Processing Society: ICASSP, ICIP, ICME, Trans. IP/SP, etc. Computer Science Society: CVPR, ICCV, ICDM, Trans. PAMI, etc. Computer Intelligence Society: IJCNN, Trans. NNs, etc. • IAPR ICPR, ICIAP, ICDAR, ICB, MCS, etc. • ACM SIGIR, SIGGRAPH, SIGMM, etc. • Int’l NN, ML, etc. societies IJCNN, ICML, etc. • Journals on PR, PR Letters, etc.

  7. What Do You See?

  8. Detection: Are There Faces?

  9. Detection: Are There Animals?

  10. Verification: Is This Mr. Einstein? No, He’s Not. Actually, my advisor looks more like Einstein…

  11. Object Categorization Animals Arch Faces Ground etc.

  12. Scene & Context Categorization Outdoor, Night, etc.

  13. It’s not that complicated…is it?? • Down-sampled & grayscale 22 x 28 pixel image • It’s Bayesian in Machine Learning… Likelihood: P(image | face) or Posterior: P(face | image) • Without prior knowledge… Number of all possible 22 x 28 pixel images = 222 x 28 x 8 • Do not try this at home… 222 x 28 x 8 =1.2 x 1015 >>>> world population 6.6 x 109 Inspired by Prof. Tsuhan Chen @ Cornell

  14. The Chemistry betweenSignal Processing & Machine Learning • Signal Processing Representation & Transforms Coding Transmission Compression Reconstruction • Machine Learning Feature extraction/selection Information retrieval (data mining) Sup. or unsup. learning/clustering Detection/estimation/identification Classification Bioinformatics Counter-Terrorism Language Processing Computer Vision Biometrics Product Inspection Marketing Analysis Internet Search Network Security

  15. Machine Learning / Pattern Recognition Applications

  16. Pattern Recognition Applications Biometrics

  17. Pattern Recognition Applications Automated Target Recognition

  18. Pattern Recognition Applications Cancer Diagnosis

  19. Pattern Recognition Applications Analysis of 3D protein structure

  20. Pattern Recognition Applications Clustering / analysis of microarray data

  21. Framework of Pattern Recognition Systems

  22. Three Main Issuesby Prof. Fei-Fei Li @ Princeton • Representation - how to represent a pattern class or a dataset - feature extraction & selection • Learning - how to form a classification system (given training data) - classifier and its parameter selection • Decision - how to classify the given test data - verification/identification, no decision, rejection, etc. • I focus on the latter two…

  23. It’s Much More Complicated than DEAL OR NO DEAL… Multi-class classification & Rejection

  24. Multi-Class Classification • Need good accuracy + efficiency

  25. Rejection • What to reject? - any unseen false classes - very difficult…why?

  26. 2 vs. 1 & 3 3 vs. 1 & 2 1 vs. 2 & 3 Standard Methods to Address Multi-class Classification Problems • One-vs-All Binary Classifiers - C binary classifiers - decided by winner-take-all • Any problems? - possible ambiguous results - similarity not used - unbalanced data learning Cl. 2 Cl. 1 Cl. 3 One Million Dollar Question by Prof. Thomas Huang @ UIUC

  27. 1 vs. 2 2 vs. 3 1 vs. 3 Standard Methods to Address Multi-class Classification Problems (cont’d) • One-vs-One Binary Classifiers - C2/2 binary classifiers - decided by majority vote • Any problems? - possible ambiguous results - need lots of classifiers - cannot do rejection Cl. 2 Cl. 1 Cl. 3

  28. Binary Hierarchical Classifier • Remarks - Divide-and-conquer strategy C-class problem → C-1 binary sub-problems Only ~log2C classifiers required in testing - We use SVM-type classifiers at each node - How to design a hierarchy? any problems??

  29. Outline • Introduction • Methods to Address Multi-class Problems • SVM-type Classifiers • Our Design Method for Hierarchical Classifiers • Our Soft-Decision Hierarchical SVRDM Classifier • Experimental Results • Conclusions & Future Directions

  30. Support Vector Machine • Binary classification problem

  31. Support Vector Machine (cont’d) • Binary classification problem max. margin

  32. h Margin = Support Vector Machine (cont’d) • How to find this optimal hyperplane? min. s.t. , xi with nonzero αi: support vectors

  33. h margin Support Vector Machine (cont’d) • If not separable, we can either - introduce slack variable ξand penalty term C min. s.t.

  34. Support Vector Machine (cont’d) • If not separable, we can also - find a nonlinear solution- technically, it’s a linear solution in a higher-order space x2 x 0 x1

  35. Support Vector Machine (cont’d) • What happens now? min. s.t. • Kernel trick, the secret behind the scene , explicit form of Φ(x) not needed e.g. Gaussian: exp(-||xi-xj||2/2σ2), polynomial: (xiTxj+1)d , etc. x2 x1

  36. Extensions of SVMs • One-class SVM - aka SVDD (support vector domain description) - find the optimal hyperplane to include data from one class & thus reject any other classes (i.e. false classes) - modified optimization problem min. s.t. - can add/apply ξ , C, kernel functions - solution vector h (nonlinear solution) x2 x1

  37. h hSVM • An Example of One-class SVM w/ Gaussian Kernel - projected data on a unit sphere in the transformed space since - h in one-class SVM can be considered as the best single representative of the class of interest (class 1 in fig) - we use this h later in our new hierarchical design method class 1 class 2

  38. Extensions of SVMs (cont’d) • SVRDM (Representation & Discrimination) - classification (SVM) + rejection (one-class SVM) & Gaussian kernel - need 2 solution vectors h1 & h2 - we use SVRDMs at each node in the hierarchy -1 < p < 1 SVM (p = -1) SVM p = 0.6 ≠ SVM

  39. Outline • Introduction • Methods to Address Multi-class Problems • SVM-type Classifiers • Our Design Method for Hierarchical Classifiers • Our Soft-Decision Hierarchical SVRDM Classifier • Experimental Results • Conclusions & Future Directions

  40. Design Methods for Binary Hierarchical Classifiers • Binary Hierarchical Classifier Design Class 2 Class 3 Class 1 Class 5 Class 4 ?

  41. N classes N classes N/2 classes macro-class A N/2 classes macro-class B Prior Binary Hierarchical Design Methods - 1 • Exhaustive search? - Need to search all possible macro-class pairs - You cannot beat this method, BUT… 2N-1 possible pairs at each node! 2N-1 possible choices!!

  42. Class 2 Class 3 Class 1 Class 5 Class 4 Prior Binary Hierarchical Design Methods - 2 • K-means clustering on class means (review) iteratively min. ; μ: class mean, m: cluster mean - determines 2 macro-classes (k = 2) in the original data space μ2 μ3 μ1 μ5 μ4 K-means clustering (k = 2) cluster 1 cluster 2

  43. h: solution in higher-order space Φ(m): cluster mean in same space Weighted Support Vector K-means Clustering - 1 • k-means clustering on solution vectors h from one-class SVMs - recall that, h is the best representative of each class iteratively min. - determines 2 macro-classes (k = 2) in higher-order space h2 Class 2 h3 Class 3 Class 1 h1 h5 Class 5 h4 Class 4 K-means clustering (k = 2) cluster 1 cluster 2

  44. Weighted Support Vector K-means Clustering - 2 • Remarks - solution vector h is a weighted sum of support vectors of each class - can be easily calculated by kernel trick, as SVMs e. g. - select macro-classes in higher-order space, as SVMs - automated design (not an exhaustive search & no valid set needed) - visualize distances in higher-order space

  45. Outline • Introduction • Methods to Address Multi-class Problems • SVM-type Classifiers • Our Design Method for Hierarchical Classifiers • Our Soft-Decision Hierarchical SVRDM Classifier • Experimental Results • Conclusions & Future Directions

  46. Problems in Binary Hierarchical Classifiers • Major Concerns - if misclassifications or misses occur at some internal nodes, we cannot recover from them if hard decisions used - a soft-decision hierarchical classifier is needed! C classes misclassification miss macro-class macro-class macro-cl macro-cl class ω

  47. Soft-Decision Hierarchical Classifier - 1 • Idea: use of probabilities C classes two-class classifiers P1A P1B macro-class macro-class P2A P2B macro-cl macro-cl macro-cl macro-cl P3A P3B class ω P(ω|x) = P1B x P2A x P3B

  48. Soft-Decision Hierarchical Classifier - 2 • How to convert SVM classifier outputs to probabilities? - Use sigmoid mapping function* to map output t to probability P - Estimate parameters a & b by ML estimate input x SVM output t = hTΦ(x)+b P(y = 1|t) P(y = -1|t) = 1 - P(y = 1|t) *J. C. Platt, “Probabilities for SVMs,” in Adv. in Large Margin Classifiers, MIT Press, 1999

More Related