1 / 76

Object Orie’d Data Analysis, Last Time

Object Orie’d Data Analysis, Last Time. Kernel Embedding Embed data in higher dimensional manifold Gives greater flexibility to linear methods Support Vector Machines Aimed at very non-Gaussian Data E.g. from Kernel Embedding Distance Weighted Discrimination HDLSS Improvement of SVM.

lesley-good
Download Presentation

Object Orie’d Data Analysis, Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Orie’d Data Analysis, Last Time • Kernel Embedding • Embed data in higher dimensional manifold • Gives greater flexibility to linear methods • Support Vector Machines • Aimed at very non-Gaussian Data • E.g. from Kernel Embedding • Distance Weighted Discrimination • HDLSS Improvement of SVM

  2. Support Vector Machines Graphical View, using Toy Example: • Find separating plane • To maximize distances from data to plane • In particular smallest distance • Data points closest are called support vectors • Gap between is called margin

  3. Support Vector Machines Graphical View, using Toy Example:

  4. Support Vector Machines Forgotten last time, Important Extension: Multi-Class SVMs Hsu & Lin (2002) Lee, Lin, & Wahba (2002) • Defined for “implicit” version • “Direction Based” variation???

  5. Support Vector Machines Also forgotten last time, Toy examples illustrating Explicit vs. Implicit Kernel Embedding As well as effect of window width, σ on Gaussian kernel embedding

  6. SVMs, Comput’n & Embedding For an “Embedding Map”, e.g. Explicit Embedding: Maximize: Get classification function: • Straightforward application of embedding • But loses inner product advantage

  7. SVMs, Comput’n & Embedding Implicit Embedding: Maximize: Get classification function: • Still defined only via inner products • Retains optimization advantage • Thus used very commonly • Comparison to explicit embedding? • Which is “better”???

  8. Support Vector Machines Target Toy Data set:

  9. Support Vector Machines Explicit Embedding, window σ = 0.1:

  10. Support Vector Machines Explicit Embedding, window σ = 1:

  11. Support Vector Machines Explicit Embedding, window σ = 10:

  12. Support Vector Machines Explicit Embedding, window σ = 100:

  13. Support Vector Machines Notes on Explicit Embedding: • Too small  Poor generalizability • Too big  miss important regions • Classical lessons from kernel smoothing • Surprisingly large “reasonable region” • I.e. parameter less critical (sometimes?) Also explore projections (in kernel space)

  14. Support Vector Machines Kernel space projection, window σ = 0.1:

  15. Support Vector Machines Kernel space projection, window σ = 1:

  16. Support Vector Machines Kernel space projection, window σ = 10:

  17. Support Vector Machines Kernel space projection, window σ = 100:

  18. Support Vector Machines Kernel space projection, window σ = 100:

  19. Support Vector Machines Notes on Kernel space projection: • Too small  • Great separation • But recall, poor generalizability • Too big  no longer separable • As above: • Classical lessons from kernel smoothing • Surprisingly large “reasonable region” • I.e. parameter less critical (sometimes?) Also explore projections (in kernel space)

  20. Support Vector Machines Implicit Embedding, window σ = 0.1:

  21. Support Vector Machines Implicit Embedding, window σ = 0.5:

  22. Support Vector Machines Implicit Embedding, window σ = 1:

  23. Support Vector Machines Implicit Embedding, window σ = 10:

  24. Support Vector Machines Notes on Implicit Embedding: • Similar Large vs. Small lessons • Range of “reasonable results” Seems to be smaller (note different range of windows) • Much different “edge” behavior Interesting topic for future work…

  25. Distance Weighted Discrim’n 2-d Visualization: Pushes Plane Away From Data All Points Have Some Influence

  26. Distance Weighted Discrim’n References for more on DWD: • Current paper: Marron, Todd and Ahn (2007) • Links to more papers: Ahn (2007) • JAVA Implementation of DWD: caBIG (2006) • SDPT3 Software: Toh (2007)

  27. Batch and Source Adjustment Recall from Class Notes 8/28/07 • For Stanford Breast Cancer Data (C. Perou) • Analysis in Benito, et al (2004) Bioinformatics, 20, 105-114. https://genome.unc.edu/pubsup/dwd/ • Adjust for Source Effects • Different sources of mRNA • Adjust for Batch Effects • Arrays fabricated at different times

  28. Source Batch Adj: Biological Class Col. & Symbols

  29. Source Batch Adj: Source Colors

  30. Source Batch Adj: PC 1-3 & DWD direction

  31. Source Batch Adj: DWD Source Adjustment

  32. Source Batch Adj: Source Adj’d, PCA view

  33. Source Batch Adj: S. & B Adj’d, Adj’d PCA

  34. Why not adjust using SVM? • Major Problem: Proj’d Distrib’al Shape Triangular Dist’ns (opposite skewed) • Does not allow sensible rigid shift

  35. Why not adjust using SVM? • Nicely Fixed by DWD • Projected Dist’ns near Gaussian • Sensible to shift

  36. Why not adjust by means? • DWD is complicated: value added? • Xuxin Liu example… • Key is sizes of biological subtypes • Differing ratio trips up mean • But DWD more robust (although still not perfect)

  37. Why not adjust by means? Next time: Work in before and after, slides like 138-141 from DWDnormPreso.ppt In Research/Bioinf/caBIG

  38. Twiddle ratios of subtypes

  39. Why not adjust by means? DWD robust against non-proportional subtypes… Mathematical Statistical Question: Are there mathematics behind this? (will answer next time…)

  40. DWD in Face Recognition • Face Images as Data (with M. Benito & D. Peña) • Male – Female Difference? • Discrimination Rule? • Represented as long vector of pixel gray levels • Registration is critical

  41. DWD in Face Recognition, (cont.) • Registered Data • Shifts and scale • Manually chosen • To align eyes and mouth • Still large variation • See males vs. females???

  42. DWD in Face Recognition , (cont.) • DWD Direction • Good separation • Images “make sense” • Garbage at ends? (extrapolation effects?)

  43. DWD in Face Recognition , (cont.) • Unregistered Version • Much blurrier • Since features don’t properly line up • Nonlinear Variation • But DWD still works • Can see M-F differ’ce?

  44. DWD in Face Recognition , (cont.) • Interesting summary: • Jump between means (in DWD direction) • Clear separation of Maleness vs. Femaleness

  45. DWD in Face Recognition , (cont.) • Fun Comparison: • Jump between means (in SVM direction) • Also distinguishes Maleness vs. Femaleness • But not as well as DWD

  46. DWD in Face Recognition , (cont.) Analysis of difference: Project onto normals • SVM has “small gap” (feels noise artifacts?) • DWD “more informative” (feels real structure?)

  47. DWD in Face Recognition, (cont.) • Current Work: • Focus on “drivers”: (regions of interest) • Relation to Discr’n? • Which is “best”? • Lessons for human perception?

  48. Outcomes Data Breast Cancer Study (C. M. Perou): • Outcome of interest = death or survival • Connection with gene expression? Approach: • Treat death vs. survival during study as “classes” • Find “direction that best separates the classes”

  49. Outcomes Data Find “direction that best separates the classes”

  50. Outcomes Data Find “direction that best separates the classes”

More Related