object orie d data analysis last time n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Object Orie’d Data Analysis, Last Time PowerPoint Presentation
Download Presentation
Object Orie’d Data Analysis, Last Time

Loading in 2 Seconds...

play fullscreen
1 / 34
nuri

Object Orie’d Data Analysis, Last Time - PowerPoint PPT Presentation

78 Views
Download Presentation
Object Orie’d Data Analysis, Last Time
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Object Orie’d Data Analysis, Last Time • Classification / Discrimination • Try to Separate Classes +1 & -1 • Statistics & EECS viewpoints • Introduced Simple Methods • Mean Difference • Naïve Bayes • Fisher Linear Discrimination (nonparametric view) • Gaussian Likelihood ratio • Started Comparing

  2. Classification - Discrimination Important Distinction: Classification vs. Clustering Useful terminology: Classification: supervised learning Clustering: unsupervised learning

  3. Fisher Linear Discrimination Graphical Introduction (non-Gaussian):

  4. Classical Discrimination FLD for Tilted Point Clouds – Works well

  5. Classical Discrimination GLR for Tilted Point Clouds – Works well

  6. Classical Discrimination FLD for Donut – Poor, no plane can work

  7. Classical Discrimination GLR for Donut – Works well (good quadratic)

  8. Classical Discrimination FLD for X – Poor, no plane can work

  9. Classical Discrimination GLR for X – Better, but not great

  10. Classical Discrimination Summary of FLD vs. GLR: • Tilted Point Clouds Data • FLD good • GLR good • Donut Data • FLD bad • GLR good • X Data • FLD bad • GLR OK, not great Classical Conclusion: GLR generally better (will see a different answer for HDLSS data)

  11. Classical Discrimination FLD Generalization II (Gen. I was GLR) Different prior probabilities Main idea: Give different weights to 2 classes • I.e. assume not a priori equally likely • Development is “straightforward” • Modified likelihood • Change intercept in FLD • Won’t explore further here

  12. Classical Discrimination FLD Generalization III Principal Discriminant Analysis • Idea: FLD-like approach to > two classes • Assumption: Class covariance matrices are the same (similar) (but not Gaussian, same situation as for FLD) • Main idea: Quantify “location of classes” by their means

  13. Classical Discrimination Principal Discriminant Analysis (cont.) Simple way to find “interesting directions” among the means: PCA on set of means i.e. Eigen-analysis of “between class covariance matrix” Where Aside: can show: overall

  14. Classical Discrimination Principal Discriminant Analysis (cont.) But PCA only works like Mean Difference, Expect can improve by taking covariance into account. Blind application of above ideas suggests eigen-analysis of:

  15. Classical Discrimination Principal Discriminant Analysis (cont.) There are: • smarter ways to compute (“generalized eigenvalue”) • other representations (this solves optimization prob’s) Special case: 2 classes, reduces to standard FLD Good reference for more: Section 3.8 of: Duda, Hart & Stork (2001)

  16. Classical Discrimination Summary of Classical Ideas: • Among “Simple Methods” • MD and FLD sometimes similar • Sometimes FLD better • So FLD is preferred • Among Complicated Methods • GLR is best • So always use that • Caution: • Story changes for HDLSS settings

  17. HDLSS Discrimination Recall main HDLSS issues: • Sample Size, n < Dimension, d • Singular covariance matrix • So can’t use matrix inverse • I.e. can’t standardize (sphere) the data (requires root inverse covariance) • Can’t do classical multivariate analysis

  18. HDLSS Discrimination An approach to non-invertible covariances: • Replace by generalized inverses • Sometimes called pseudo inverses • Note: there are several • Here use Moore Penrose inverse • As used by Matlab (pinv.m) • Often provides useful results (but not always) Recall Linear Algebra Review…

  19. Recall Linear Algebra Eigenvalue Decomposition: For a (symmetric) square matrix Find a diagonal matrix And an orthonormal matrix (i.e. ) So that: , i.e.

  20. Recall Linear Algebra (Cont.) • Eigenvalue Decomp. solves matrix problems: • Inversion: • Square Root: • is positive (nonn’ve, i.e. semi) definite all

  21. Recall Linear Algebra (Cont.) Moore-Penrose Generalized Inverse: For

  22. Recall Linear Algebra (Cont.) • Easy to see this satisfies the definition of • Generalized (Pseudo) Inverse • symmetric • symmetric

  23. Recall Linear Algebra (Cont.) Moore-Penrose Generalized Inverse: Idea: matrix inverse on non-null space of linear transformation Reduces to ordinary inverse, in full rank case, i.e. for r = d, so could just always use this Tricky aspect: “>0 vs. = 0” & floating point arithmetic

  24. HDLSS Discrimination Application of Generalized Inverse to FLD: Direction (Normal) Vector: Intercept: Have replaced by

  25. HDLSS Discrimination Toy Example: Increasing Dimension data vectors: • Entry 1: Class +1: Class –1: • Other Entries: • All Entries Independent Look through dimensions,

  26. HDLSS Discrimination Increasing Dimension Example Proj. on Opt’l Dir’n Proj. on FLD Dir’n Proj. on both Dir’ns

  27. HDLSS Discrimination Add a 2nd Dimension (noise) Same Proj. on Opt’l Dir’n Axes same as dir’ns Now See 2 Dim’ns

  28. HDLSS Discrimination Add a 3rd Dimension (noise) Project on 2-d subspace generated by optimal dir’n & by FLD dir’n

  29. HDLSS Discrimination Movie Through Increasing Dimensions

  30. HDLSS Discrimination FLD in Increasing Dimensions: • Low dimensions (d = 2-9): • Visually good separation • Small angle between FLD and Optimal • Good generalizability • Medium Dimensions (d = 10-26): • Visual separation too good?!? • Larger angle between FLD and Optimal • Worse generalizability • Feel effect of sampling noise

  31. HDLSS Discrimination FLD in Increasing Dimensions: • High Dimensions (d=27-37): • Much worse angle • Very poor generalizability • But very small within class variation • Poor separation between classes • Large separation / variation ratio

  32. HDLSS Discrimination FLD in Increasing Dimensions: • At HDLSS Boundary (d=38): • 38 = degrees of freedom (need to estimate 2 class means) • Within class variation = 0 ?!? • Data pile up, on just two points • Perfect separation / variation ratio? • But only feels microscopic noise aspects So likely not generalizable • Angle to optimal very large

  33. HDLSS Discrimination FLD in Increasing Dimensions: • Just beyond HDLSS boundary (d=39-70): • Improves with higher dimension?!? • Angle gets better • Improving generalizability? • More noise helps classification?!?

  34. HDLSS Discrimination FLD in Increasing Dimensions: • Far beyond HDLSS boun’ry (d=70-1000): • Quality degrades • Projections look terrible (populations overlap) • And Generalizability falls apart, as well • Math’s worked out by Bickel & Levina (2004) • Problem is estimation of d x d covariance matrix