The Multiple Correlation Coefficient

The Multiple Correlation Coefficient

Definition has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable y is independent of the vector The multiple correlation coefficient is the maximum correlation between y and a linear combination of the components of

Derivation This vector has a bivariate Normal distribution with mean vector and Covariance matrix We are interested if the variable y is independent of the vector The multiple correlation coefficient is the maximum correlation between y and a linear combination of the components of

The multiple correlation coefficient is the maximum correlation between y and The correlation between y and Thus we want to choose to maximize Equivalently

Note:

The multiple correlation coefficient is independent of the value of k.

We are interested if the variable y is independent of the vector The sample Multiple correlation coefficient Then the sample Multiple correlation coefficient is

Testing for independence between y and The test statistic If independence is true then the test statistic F will have an F-distributions with n1 = p degrees of freedom in the numerator and n1 = n – p + 1degrees of freedom in the denominator The test is to reject independence if:

Other tests for Independence Test for zero correlation (Independence between a two variables) The test statistic If independence is true then the test statistic t will have a t -distributions with n = n –2degrees of freedom. The test is to reject independence if:

Test for non-zero correlation (H0: r = r0 ) The test statistic If H0 is true the test statistic z will have approximately a Standard Normal distribution We then rejectH0 if:

Test for zero partial correlation correlation (Conditional independence between a two variables given a set of p Independent variables) The test statistic = the partial correlation between yi and yj given x1, …, xp. If independence is true then the test statistic t will have a t -distributions with n = n – p - 2degrees of freedom. The test is to reject independence if:

Test for non-zero partial correlation The test statistic If H0 is true the test statistic z will have approximately a Standard Normal distribution We then rejectH0 if:

Canonical Correlation Analysis

The problem Quite often when one has collected data on several variables. The variables are grouped into two (or more) sets of variables and the researcher is interested in whether one set of variables is independent of the other set. In addition if it is found that the two sets of variates are dependent, it is then important to describe and understand the nature of this dependence. The appropriate statistical procedure in this case is called Canonical Correlation Analysis.

Canonical Correlation: An Example In the following study the researcher was interested in whether specific instructions on how to relax when taking tests and how to increase Motivation , would affect performance on standardized achievement tests • Reading, • Language and • Mathematics

A group of 65 third- and fourth-grade students were rated after the instruction and immediately prior taking the Scholastic Achievement tests on: • how relaxed they were (X1) and • how motivated they were (X2). In addition data was collected on the three achievement tests • Reading (Y1), • Language (Y2) and • Mathematics (Y3). The data were tabulated on the next page

Definition: (Canonical variates and Canonical correlations) have p-variate Normal distribution and with Let and be such that U1 and V1 have achieved the maximum correlation f1. Then U1 and V1 are called the first pair of canonical variates and f1 is called the first canonical correlation coefficient.

derivation: ( 1st pair of Canonical variates and Canonical correlation) Now Thus has covariance matrix

derivation: ( 1st pair of Canonical variates and Canonical correlation) Now Thus has covariance matrix hence

Thus we want to choose so that is at a maximum or is at a maximum Let

Computing derivatives and

Thus This shows that is an eigenvector of k is the largest eigenvalue of and is the eigenvector associated with the largest eigenvalue.

Also and

Summary: The first pair of canonical variates , eigenvectors of the matrices are found by finding associated with the largest eigenvalue (same for both matrices) The largest eigenvalue of the two matrices is the square of the first canonical correlation coefficientf1

The remaining canonical variates and canonical correlation coefficients The second pair of canonical variates , so that are found by finding 1. (U2,V2) are independent of (U1,V1). 2. The correlation between U2 and V2 is maximized The correlation, f2, between U2 and V2 is called the second canonical correlation coefficient.

The ith pair of canonical variates , so that are found by finding 1. (Ui,Vi) are independent of (U1,V1), …, (Ui-1,Vi-1). 2. The correlation between Ui and Vi is maximized The correlation, f2, between U2 and V2 is called the second canonical correlation coefficient.

derivation: ( 2nd pair of Canonical variates and Canonical correlation) Now has covariance matrix

Now and maximizing Is equivalent to maximizing subject to Using the Lagrange multiplier technique

Now and also gives the restrictions

These equations can used to show that are eigenvectors of the matrices associated with the 2nd largest eigenvalue (same for both matrices) The 2ndlargest eigenvalue of the two matrices is the square of the 2nd canonical correlation coefficientf2

continuing Coefficients for the ith pair of canonical variates, are eigenvectors of the matrices associated with the ith largest eigenvalue (same for both matrices) The ithlargest eigenvalue of the two matrices is the square of the ith canonical correlation coefficientfi

Example Variables • relaxation Score (X1) • motivation score (X2). • Reading (Y1), • Language (Y2) and • Mathematics (Y3).

Summary Statistics

Canonical Correlation statistics Statistics

continued

Summary U1 = 0.197 Relax + 0.979 Mot V1 = 0.504 Read + 0.900 Lang + 0.565 Math f1 = .592 U2 = 0.980 Relax + 0.203 Mot V2 = 0.391 Math - 0.361 Read - 0.354 Lang f2 = .159

The Multiple Correlation Coefficient