1 / 17

Chapter 5 Part II

Chapter 5 Part II. 5.3 Spread of Data 5.4 Fisher Discriminant. Measuring the spread of data. Covariance of two random variables, x and y Expectation of their product x , y need to be standardized if they use different units of measurement. Correlation.

diella
Download Presentation

Chapter 5 Part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5Part II 5.3 Spread of Data 5.4 Fisher Discriminant

  2. Measuring the spread of data • Covariance of two random variables, x and y • Expectation of their product • x, y need to be standardized if they use different units of measurement

  3. Correlation • Covariance of x and y measure correlation • This treats coordinates independently • In kernel-induced feature space we don’t have access to the coordinates

  4. Spread in the Feature Space • Consider l × N matrix X • Assume zero mean, then covariance matrix C

  5. Spread in the Feature Space • Observe • Consider (unit vector) then the value of the projection is:

  6. Spread in the Feature Space • Variance of the norms of the projections onto v • Where

  7. Spread in the Feature Space • So the covariance matrix contains everything needed to calculate the variance of data along any projection direction • If the data is not centered, suntract square of mean projection

  8. Variance of Projections • Variance of projections onto fixed direction v in feature space using only inner product • v is a linear combination of training points • Then:

  9. Now that we can compute the variance of projections in feature space we can implement a linear classifier • The Fisher discriminant

  10. Fisher Discriminant • Classification function: • Where w is chosen to maximize

  11. Regularized Fisher discriminant • Choose w to solve • Quotient is invariant to rescalings of w • Use fixed value C for denominator • Using a Lagrange multiplier v, the solution is

  12. Regularized Fisher discriminant • We then have • Where • y is vector of labels {-1, +1} • I+ (I-) is identity matrix with only positive (negative) columns containing 1s • j+ (j-) all-1s vector, similar to I+ (I-)

  13. Regularized Fisher discriminant • Furthermore, let • Where D is a diagonal matrix • And where C+, C- are given by

  14. Regularized Fisher discriminant • Then • With appropriate redefinitions of v, λ and C • Taking derivatives with respect to w produces

  15. Dual expression of w • We can express w in feature space as a linear combination of training samples w=X’α, with • Substituting w produces • Giving • This is invariant to rescalings of w, so we can rescale α by v to obtain

  16. Regularized kernel Fisher discriminant • Solution given by • Classification function is • Where k is the vector with entries k(x,xi), i=1,…,l • And b is chosen so that w’μ+-b = b-w’μ-

  17. Regularized kernel Fisher discriminant • Taking w=X’α, we have • where

More Related