Presented by: Shankar Bhargav

1 / 22

Presented by: Shankar Bhargav - PowerPoint PPT Presentation

Canonical Correlation Analysis: An overview with application to learning methods By David R. Hardoon, Sandor Szedmak, John Shawe-Taylor School of Electronics and Computer Science, University of Southampton Published in Neural Computaion, 2004. Presented by: Shankar Bhargav.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Presented by: Shankar Bhargav' - kristy

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Canonical Correlation Analysis: An overview with application to learning methodsBy David R. Hardoon, Sandor Szedmak, John Shawe-TaylorSchool of Electronics and Computer Science, University of SouthamptonPublished in Neural Computaion, 2004

Presented by:

Shankar Bhargav

Canonical Correlation Analysis
• Measuring the linear relationship between two multi dimensional variables
• Finding two sets of basis vectors such that the correlation between the projections of the variables onto these basis vectors is maximized
• Determine Correlation Coefficients
Canonical Correlation Analysis
• More than one canonical correlations will be found each corresponding to a different set of basis vectors/Canonical variates
• Correlations between successively extracted canonical variates are smaller and smaller
• Correlation coefficients : Proportion of correlation between the canonical variates accounted for by the particular variable.
Differences with Correlation
• Not dependent on the coordinate system of variables
• Finds direction that yield maximum correlations
Find basis vectors for two sets of variables x, y such that the correlations between the projections of the variables onto these basis vector

Sx = (x.wx) and Sy = (y.wy)

ρ = E[Sx Sy ]

√ E[Sx2] E[Sy2]

ρ = E[(xT wx yT wy)]

√E[(xT wx xT wx) ] E[(yT wy yT wy)]

ρ = max wx wy E[wxTxyT wy]

√E[wxTx xT wx ] E[wyT yyT wy]

ρ = max wx wy wxTCxywy

√ wxTCxxwx wyTCyywy

Solving this

with constraint wxTCxxwx=1

wyTCyywy=1

Cxx-1CxyCyy-1Cyx wx = ρ2 wx

Cyy-1CyxCxx-1Cxy wy= ρ2 wy

Cxy wy = ρλxCxx wx

Cyx wx = ρλy Cyywy

λx=λy-1= wyTCyywy

√wxTCxxwx

CCA in Matlab

[ A, B, r, U, V ] = canoncorr(x, y)

x, y : set of variables in the form of matrices

• Each row is an observation
• Each column is an attribute/feature

A, B: Matrices containing the correlation coefficient

r : Column matrix containing the canonical correlations (Successively decreasing)

U, V: Canonical variates/basis vectors for A,B respectively

Interpretation of CCA
• Correlation coefficient represents unique contribution of each variable to relation
• Multicollinearity may obscure relationships
• Factor Loading : Correlations between the canonical variates (basis vector) and the variables in each set
• Proportion of variance explained by the canonical variates can be inferred by factor loading
Redundancy Calculation

p – Number of variable in the first (left) set of variables

q – Number of variable in the second (right) set of variables

Rc2 – Respective squared canonical correlation

Since successively extracted roots are uncorrelated we can sum the redundancies across all correlations to get a single index of redundancy.

Application
• Kernel CCA can be used to find non linear relationships between multi variates
• Two views of the same semantic object to extract the representation of the semantics
• Speaker Recognition – Audio and Lip movement
• Image retrieval – Image features (HSV, Texture) and Associated text
Use of KCCA in cross-modal retrieval
• 400 records of JPEG images for each class with associated text and a total of 3 classes
• Data was split randomly into 2 parts for training and test
• Features
• Image – HSV Color, Gabor texture
• Text – Term frequencies
• Results were taken for an average of 10 runs
Cross-modal retrieval
• Content based retrieval: Retrieve images in the same class
• Tested with 10 and 30 images sets
• where countjk = 1 if the image k in the set is of the same label as the text query present in the set, else countjk = 0.
Mate based retrieval
• Match the exact image among the selected retrieved images
• Tested with 10 and 30 images sets
• where countj = 1 if the exact matching image was present in the set else it is 0