9. Canonical Correlation

9. Canonical Correlation

Review: Linear Regression In regression, our goal is to use the information available from the independent variables X to make our best guess about the value of the dependent variable Y. The regression model can be written as yi=b0+b1xi1+b2xi2+…+bpxip+ei or, in matrix terms, y=Xb+e

Review: Linear Regression The estimate of b is chosen to minimize the sum of squared deviations between the actual values(y) and the fitted values (Xb). Thus, the objective functions are given by min(y-Xb)’(y-Xb) The first-order conditions are given by 2X’(y-Xb)=0 The solution, so-called the Least Square Estimator, is b=(X’X)-1X’y

A View to Linear Regression Instead of minimizing the sum of the squared deviations, we look to find the linear combination of the independent variables X that maximizes the correlation with the single dependent variable y.

A View to Linear Regression Assume that Y and X have been standardized. We have • var(y)=y’y/(n-1)=1 • var(Xb)=b’X’X’b/(n-1) • cov(y,Xb)=y’Xb/(n-1) This yields the following constrained maximization problem: max (y’Xb/(n-1)) w.r.t. b’X’Xb/(n-1)=1

A View to Linear Regression Lagrangian Objective function where λis a Lagrange multiplier. The first-order conditions with respect to b is

A View to Linear Regression Rearranging terms and solving for b yields: b=(2λ)-1(X’X)-1X’y Because we have to rescale b to satisfy the scaling constraint, we can ignore the constant and write the expression as: b∝ (X’X)-1X’y Note that this is exactly the ordinary least squares estimator for b.

Problem? The benefit to this approach is that it is easily extended to more than one dependent variable. With more than one dependent variable Y, we now pose the problem as follows: What linear combination of X and linear combination of Y produces the highest correlation?

Canonical Correlation If let u=Xb, t=Ya, the problem can be written as follows: max r(t,u)=a’Y’Xb/(n-1) w.r.t. a’Y’Ya/(n-1)=1 & b’X’Xb/(n-1)=1 • canonical variates: the new variables t and u • canonical correlation: the correlation between t and u.

A Problem with (X1,X2)and (Y1,Y2) Correlation matrix for 4 variables Y1 Y2 X1 X2 Y1 1.000 -0.307 0.221 0.445 Y2 -0.307 1.000 0.316 0.168 X1 0.221 0.445 1.000 -0.176 X2 0.445 0.168 -0.176 1.000

A Problem with (X1,X2)and (Y1,Y2) R2(X2,Y1)<20%

A Problem with (X1,X2)and (Y1,Y2) With such a simple problem, we can find the solution using an exhaustive numerical search. We simply consider all possible combinations of Y and all possible combinations of X and then choose the pair of combinations with the highest correlation. Because there are an infinite number of possible combinations, we start with a rough grid. We can later refine our research, depending on how accurate we want our result to be.

A Problem with (X1,X2)and (Y1,Y2) Y1/Y2 X1/X2

A Problem with (X1,X2)and (Y1,Y2) R2(X2,Y1)<20% R2(0.4X1+0.6X2,0.6Y1+0.4Y2)>50%

A Problem with (X1,X2)and (Y1,Y2)

Correlation matrix:RXX, RYY, RXY Y1 Y2 X1 X2 Y1 1.000 -0.307 0.221 0.445 Y2 -0.307 1.000 0.316 0.168 X1 0.221 0.445 1.000 -0.176 X2 0.445 0.168 -0.176 1.000

Mechanics Let u=Xb, t=Ya. Goal: To find a and b so as to maximize r(t,u).

Mechanics By standardizing t and u, we effectively eliminate the denominator from the objective function. Imposing these constraints, the problem becomes: max a’RYXb Subject to a’RYYa=1 and b’RXXb=1

Mechanics Lagrangian Objective function: L=a’RYXb-c(a’RYYa-1)-d(b’RXXb-1) Differentiating with respect to a and b and setting the results equal to zero gives the first-order conditions: RYXb-cRYYa=0 RXYa-dRXXb=0

Mechanics Taking these two equations and Premultiplying by a’ and b’ respectively yields a’RYXb-ca’RYYa=0 b’RXYa-db’RXXb=0 which implies that c=r(t,u)=canonical correlation =d

Mechanics From RYXb-rRYYa=0 RXYa-rRXXb=0 We obtain a=r-1RYY-1RYXb b=r-1RXX-1RXYa =r-2RXX-1RXYRYY-1RYXb

Mechanics The vector b is the first eigenvector of the following eigenvector-eigenvalue problem: RXX-1RXYRYY-1RYXb=r2b Although the eigenvalues from a nonsymmetric matrix are not necessarily real valued, the structure of the canonical correlation problem is such that the eigenvalues are both real valued and non-negative.

Mechanics The vector a is the first eigenvector of the following eigenvector-eigenvalue problem: RYY-1RYXRXX-1RXYa=r2a The first eigenvalue is again the squared canonical correlation.

Let We obtain 由此, 能够得到什么结论?

A Simple Problem Again Correlation matrix for 4 variables Y1 Y2 X1 X2 Y1 1.000 -0.307 0.221 0.445 Y2 -0.307 1.000 0.316 0.168 X1 0.221 0.445 1.000 -0.176 X2 0.445 0.168 -0.176 1.000

A Simple Problem Again Y1/Y2 X1/X2

A Simple Problem Again • X2 is weighted slightly more than X1 • Y1 is weighted slightly more than Y2

Canonical Loadings The correlations between X and u=Xb are given by cov(X,u)=X’u/(n-1)=RXXb The correlations between Y and t=Ya are given by cov(Y,t)=Y’t/(n-1)=RYYa

A Simple Problem Again • Both variables X1 and X2 load positively on u (0.58 and 0.70, respectively) • Both variables Y1 and Y2 load positively on t (0.69 and 0.48, respectively)

The 2nd pair of canonical variates

A Simple Problem Again This suggests that beyond the 1st pair of variates there is relatively little remaining to be accounted for.

Redundancies Note that the square canonical correlation r2(t,u) does not tell us how much of the variance in Y is explained by X (or, for that matter, how much of the variance in X is explained by Y). In fact, it only tells us how much of the variance in t (a linear combination of Y) is explained by u (a linear combination of X). But t may account for only a small proportion of variance in Y. To answer how much of the variance in Y is explained by X, we can use a measure of redundancy developed by Stewart and Love (1968)

Questions Regarding the Application of Canonical Correlation • Is the relationship between the X’s and the Y’s significant? • How many pairs of canonical variates are significant? • How do I assess the validity of the results from a canonical correlation analysis?

9. Canonical Correlation

9. Canonical Correlation

Presentation Transcript

Multivariate Data Analysis Chapter 8 - Canonical Correlation Analysis

Class 9 Regression Correlation Analysis

Canonical Correlation

Canonical Correlation Analysis for Feature Reduction

Canonical Transformation

Canonical Correlation

Kernel Canonical Correlation Analysis (Language Independent Document Representation)

Kernel Canonical Correlation Analysis (Language Independent Document Representation)

Kernel Canonical Correlation Analysis

Canonical Correlation

Canonical correlations

Canonical Correlation Analysis and Related Techniques

Chapter 9: Correlation and Regression

Canonical Correlation simple correlation -- y 1 = x 1

Canonical Correlation Analysis (CCA)

9. Linear Regression and Correlation

9. Linear Regression and Correlation

Canonical Correlation: Equations

Canonical Correlation Analysis

Canonical Correlation 典型相關

Canonical Correlation