Object orie d data analysis last time
Download
1 / 76

Object Orie’d Data Analysis, Last Time - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Object Orie’d Data Analysis, Last Time. Kernel Embedding Embed data in higher dimensional manifold Gives greater flexibility to linear methods Support Vector Machines Aimed at very non-Gaussian Data E.g. from Kernel Embedding Distance Weighted Discrimination HDLSS Improvement of SVM.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Object Orie’d Data Analysis, Last Time' - lesley-good


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Object orie d data analysis last time
Object Orie’d Data Analysis, Last Time

  • Kernel Embedding

    • Embed data in higher dimensional manifold

    • Gives greater flexibility to linear methods

  • Support Vector Machines

    • Aimed at very non-Gaussian Data

    • E.g. from Kernel Embedding

  • Distance Weighted Discrimination

    • HDLSS Improvement of SVM


Support vector machines
Support Vector Machines

Graphical View, using Toy Example:

  • Find separating plane

  • To maximize distances from data to plane

  • In particular smallest distance

  • Data points closest are called

    support vectors

  • Gap between is called margin


Support vector machines1
Support Vector Machines

Graphical View, using Toy Example:


Support vector machines2
Support Vector Machines

Forgotten last time,

Important Extension:

Multi-Class SVMs

Hsu & Lin (2002)

Lee, Lin, & Wahba (2002)

  • Defined for “implicit” version

  • “Direction Based” variation???


Support vector machines3
Support Vector Machines

Also forgotten last time,

Toy examples illustrating

Explicit vs. Implicit

Kernel Embedding

As well as effect of window width, σ

on Gaussian kernel embedding


Svms comput n embedding
SVMs, Comput’n & Embedding

For an “Embedding Map”,

e.g.

Explicit Embedding:

Maximize:

Get classification function:

  • Straightforward application of embedding

  • But loses inner product advantage


Svms comput n embedding1
SVMs, Comput’n & Embedding

Implicit Embedding:

Maximize:

Get classification function:

  • Still defined only via inner products

  • Retains optimization advantage

  • Thus used very commonly

  • Comparison to explicit embedding?

  • Which is “better”???


Support vector machines4
Support Vector Machines

Target Toy Data set:


Support vector machines5
Support Vector Machines

Explicit Embedding, window σ = 0.1:


Support vector machines6
Support Vector Machines

Explicit Embedding, window σ = 1:


Support vector machines7
Support Vector Machines

Explicit Embedding, window σ = 10:


Support vector machines8
Support Vector Machines

Explicit Embedding, window σ = 100:


Support vector machines9
Support Vector Machines

Notes on Explicit Embedding:

  • Too small  Poor generalizability

  • Too big  miss important regions

  • Classical lessons from kernel smoothing

  • Surprisingly large “reasonable region”

  • I.e. parameter less critical (sometimes?)

    Also explore projections (in kernel space)


Support vector machines10
Support Vector Machines

Kernel space projection, window σ = 0.1:


Support vector machines11
Support Vector Machines

Kernel space projection, window σ = 1:


Support vector machines12
Support Vector Machines

Kernel space projection, window σ = 10:


Support vector machines13
Support Vector Machines

Kernel space projection, window σ = 100:


Support vector machines14
Support Vector Machines

Kernel space projection, window σ = 100:


Support vector machines15
Support Vector Machines

Notes on Kernel space projection:

  • Too small 

    • Great separation

    • But recall, poor generalizability

  • Too big  no longer separable

  • As above:

    • Classical lessons from kernel smoothing

    • Surprisingly large “reasonable region”

    • I.e. parameter less critical (sometimes?)

      Also explore projections (in kernel space)


Support vector machines16
Support Vector Machines

Implicit Embedding, window σ = 0.1:


Support vector machines17
Support Vector Machines

Implicit Embedding, window σ = 0.5:


Support vector machines18
Support Vector Machines

Implicit Embedding, window σ = 1:


Support vector machines19
Support Vector Machines

Implicit Embedding, window σ = 10:


Support vector machines20
Support Vector Machines

Notes on Implicit Embedding:

  • Similar Large vs. Small lessons

  • Range of “reasonable results”

    Seems to be smaller

    (note different range of windows)

  • Much different “edge” behavior

    Interesting topic for future work…


Distance weighted discrim n
Distance Weighted Discrim’n

2-d Visualization:

Pushes Plane

Away From

Data

All Points

Have Some

Influence


Distance weighted discrim n1
Distance Weighted Discrim’n

References for more on DWD:

  • Current paper:

    Marron, Todd and Ahn (2007)

  • Links to more papers:

    Ahn (2007)

  • JAVA Implementation of DWD:

    caBIG (2006)

  • SDPT3 Software:

    Toh (2007)


Batch and source adjustment
Batch and Source Adjustment

Recall from Class Notes 8/28/07

  • For Stanford Breast Cancer Data (C. Perou)

  • Analysis in Benito, et al (2004) Bioinformatics, 20, 105-114.

    https://genome.unc.edu/pubsup/dwd/

  • Adjust for Source Effects

    • Different sources of mRNA

  • Adjust for Batch Effects

    • Arrays fabricated at different times








Why not adjust using svm
Why not adjust using SVM?

  • Major Problem: Proj’d Distrib’al Shape

    Triangular Dist’ns (opposite skewed)

  • Does not allow sensible rigid shift


Why not adjust using svm1
Why not adjust using SVM?

  • Nicely Fixed by DWD

  • Projected Dist’ns near Gaussian

  • Sensible to shift


Why not adjust by means
Why not adjust by means?

  • DWD is complicated: value added?

  • Xuxin Liu example…

  • Key is sizes of biological subtypes

  • Differing ratio trips up mean

  • But DWD more robust

    (although still not perfect)


Why not adjust by means1
Why not adjust by means?

Next time:

Work in before and after, slides like 138-141 from DWDnormPreso.ppt

In Research/Bioinf/caBIG



Why not adjust by means2
Why not adjust by means?

DWD robust against non-proportional subtypes…

Mathematical Statistical Question:

Are there mathematics behind this?

(will answer next time…)


Dwd in face recognition
DWD in Face Recognition

  • Face Images as Data

    (with M. Benito & D. Peña)

  • Male – Female Difference?

  • Discrimination Rule?

  • Represented as long vector of pixel gray levels

  • Registration is critical


Dwd in face recognition cont
DWD in Face Recognition, (cont.)

  • Registered Data

  • Shifts and scale

  • Manually chosen

  • To align eyes and mouth

  • Still large variation

  • See males vs. females???


Dwd in face recognition cont1
DWD in Face Recognition , (cont.)

  • DWD Direction

  • Good separation

  • Images “make sense”

  • Garbage at ends?

    (extrapolation effects?)


Dwd in face recognition cont2
DWD in Face Recognition , (cont.)

  • Unregistered Version

  • Much blurrier

  • Since features don’t properly line up

  • Nonlinear Variation

  • But DWD still works

  • Can see M-F differ’ce?


Dwd in face recognition cont3
DWD in Face Recognition , (cont.)

  • Interesting summary:

  • Jump between means

    (in DWD direction)

  • Clear separation of

    Maleness vs. Femaleness


Dwd in face recognition cont4
DWD in Face Recognition , (cont.)

  • Fun Comparison:

  • Jump between means

    (in SVM direction)

  • Also distinguishes

    Maleness vs. Femaleness

  • But not as well as DWD


Dwd in face recognition cont5
DWD in Face Recognition , (cont.)

Analysis of difference: Project onto normals

  • SVM has “small gap” (feels noise artifacts?)

  • DWD “more informative” (feels real structure?)


Dwd in face recognition cont6
DWD in Face Recognition, (cont.)

  • Current Work:

  • Focus on “drivers”:

    (regions of interest)

  • Relation to Discr’n?

  • Which is “best”?

  • Lessons for human perception?


Outcomes data
Outcomes Data

Breast Cancer Study (C. M. Perou):

  • Outcome of interest = death or survival

  • Connection with gene expression?

    Approach:

  • Treat death vs. survival during study as “classes”

  • Find “direction that best separates the classes”


Outcomes data1
Outcomes Data

Find “direction that best separates the classes”


Outcomes data2
Outcomes Data

Find “direction that best separates the classes”


Outcomes data3
Outcomes Data

Find “direction that best separates classes”

  • DWD Projection

  • SVM Projection

    Notes:

  • SVM is “better separated”?

    (recall “data piling” problems….)

  • DWD gives “more spread between sub-populations”???

    (perhaps “more stable”?)


Outcomes data4
Outcomes Data

Which is “better”?

Approach:

  • Find “genes of interest”

  • To maximize loadings of direction vectors

    (reflects pointing in gene direction)

  • Show intensity plot (of gene expression)

  • Using top 20 genes in each direction


Outcomes data5
Outcomes Data

Which is “better”?

  • Study with gene intensity plot

  • Order cases by DWD score (proj’n)

  • Order genes by DWD loading (vec. entry)

  • Reduce to top & bottom 20

  • Color map shows gene expression

  • Shows genes that drive classification

  • Gene names also available


Outcomes data6
Outcomes Data

Which is “better”? DWD direction


Outcomes data7
Outcomes Data

Which is “better”? SVM direction


Outcomes data8
Outcomes Data

Which is “better”?

  • DWD finds genes showing better separation

  • SVM genes are less informative


Outcomes data9
Outcomes Data

How about Centroid (Mean Diff’nce) Method?


Outcomes data10
Outcomes Data

How about Centroid (Mean Diff’nce) Method?


Outcomes data11
Outcomes Data

Compare to DWD direction


Outcomes data12
Outcomes Data

How about Centroid (Mean Diff’nce) Method?

  • Best yet, in terms of red – green plot?

  • Projections unacceptably mixed?

  • These are two different goals…

  • Try for trade-off?

    Scale space approach???

  • Interesting philosophical point:

    Very simple things often “best”


Outcomes data13
Outcomes Data

Weakness of above analysis:

  • Some with “genes prone to disease” have not died yet

  • Perhaps can see in DWD expression plot?

    Better analysis:

  • More sophisticated survival methods

  • Work in progress w/ Brent Johnson, Danyu Li, Helen Zhang


Distance weighted discrim n2
Distance Weighted Discrim’n

2=d Visualization:

Pushes Plane

Away From

Data

All Points

Have Some

Influence


Distance weighted discrim n3
Distance Weighted Discrim’n

Maximal Data Piling


Hdlss discrim n simulations
HDLSS Discrim’n Simulations

Main idea:

Comparison of

  • SVM (Support Vector Machine)

  • DWD (Distance Weighted Discrimination)

  • MD (Mean Difference, a.k.a. Centroid)

    Linear versions, across dimensions


Hdlss discrim n simulations1
HDLSS Discrim’n Simulations

Overall Approach:

  • Study different known phenomena

    • Spherical Gaussians

    • Outliers

    • Polynomial Embedding

  • Common Sample Sizes

  • But wide range of dimensions


Hdlss discrim n simulations2
HDLSS Discrim’n Simulations

Spherical Gaussians:


Hdlss discrim n simulations3
HDLSS Discrim’n Simulations

Spherical Gaussians:

  • Same setup as before

  • Means shifted in dim 1 only,

  • All methods pretty good

  • Harder problem for higher dimension

  • SVM noticeably worse

  • MD best (Likelihood method)

  • DWD very close to MD

  • Methods converge for higher dimension??


Hdlss discrim n simulations4
HDLSS Discrim’n Simulations

Outlier Mixture:


Hdlss discrim n simulations5
HDLSS Discrim’n Simulations

Outlier Mixture:

80% dim. 1 , other dims 0

20% dim. 1 ±100, dim. 2 ±500, others 0

  • MD is a disaster, driven by outliers

  • SVM & DWD are both very robust

  • SVM is best

  • DWD very close to SVM (insig’t difference)

  • Methods converge for higher dimension??

    Ignore RLR (a mistake)


Hdlss discrim n simulations6
HDLSS Discrim’n Simulations

Wobble Mixture:


Hdlss discrim n simulations7
HDLSS Discrim’n Simulations

Wobble Mixture:

80% dim. 1 , other dims 0

20% dim. 1 ±0.1, rand dim ±100, others 0

  • MD still very bad, driven by outliers

  • SVM & DWD are both very robust

  • SVM loses (affected by margin push)

  • DWD slightly better (by w’ted influence)

  • Methods converge for higher dimension??

    Ignore RLR (a mistake)


Hdlss discrim n simulations8
HDLSS Discrim’n Simulations

Nested Spheres:


Hdlss discrim n simulations9
HDLSS Discrim’n Simulations

Nested Spheres:

1st d/2 dim’s, Gaussian with var 1 or C

2nd d/2 dim’s, the squares of the 1st dim’s

(as for 2nd degree polynomial embedding)

  • Each method best somewhere

  • MD best in highest d (data non-Gaussian)

  • Methods not comparable (realistic)

  • Methods converge for higher dimension??

  • HDLSS space is a strange place

    Ignore RLR (a mistake)


Hdlss discrim n simulations10
HDLSS Discrim’n Simulations

Conclusions:

  • Everything (sensible) is best sometimes

  • DWD often very near best

  • MD weak beyond Gaussian

    Caution about simulations (and examples):

  • Very easy to cherry pick best ones

  • Good practice in Machine Learning

    • “Ignore method proposed, but read

      paper for useful comparison of others”


Hdlss discrim n simulations11
HDLSS Discrim’n Simulations

Caution: There are additional players

E.g. Regularized Logistic Regression

looks also very competitive

Interesting Phenomenon:

All methods come together

in very high dimensions???


Hdlss discrim n simulations12
HDLSS Discrim’n Simulations

Can we say more about:

All methods come together

in very high dimensions???

Mathematical Statistical Question:

Mathematics behind this???

(will answer next time)


ad