1 / 16

Ch 17 Correlation

Ch 17 Correlation. 2010/3/10. We now investigate the relationships that can exist among continuous variable. Correlation analysis: Correlation is defined as the quantification of the degree to which two random variables are related, provided that the relationship is linear.

taran
Download Presentation

Ch 17 Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch 17 Correlation 2010/3/10

  2. We now investigate the relationships that can exist among continuous variable. • Correlation analysis: • Correlation is defined as the quantification of the degree to which two random variables are related, provided that the relationship is linear.

  3. 17.1 Two-Way Scatter Plot • Suppose that we are interested in a pair of continuous random variables. Example, relationship between the percentage of children who have been immunized against the infectious DPT and mortality rate. • Data for a random sample of 20 countries are show the figure 17.1. (Table 17.1) • X: the percentage of children immunized by age on year • Y: the under-five mortality rate • Before we do any analysis, we should create a two-way scatter plot of the data. (relationship exists between x and y??) • The mortality rate tends to decrease as the percentage of children immunized increase.

  4. 17.1 Two-Way Scatter Plot

  5. 17.2 Pearson’s Correlation Coefficient • In the underlying population form which the sample of points (xi,yi) is selected, the population correlation between the variables X and Y. (Greek letter: r; read rho) • The quantifies the strength of the linear relationship between the outcomes x and y. • The estimator of r is known as Pearson’s coefficient of correlation or correlation coefficient (r).

  6. 17.2 Pearson’s Correlation Coefficient • The sample correlation coefficient is denoted by r. • sx and sy are the sample standard deviations of the x and y values.

  7. The correlation coefficient is dimensionless number; it has no nuits of measurement. • -1 ≤ r ≤ 1 • The value r=1 and r=-1 occur when there is an exact linear relationship between x and y. (Figure 17.2 (a)(b)) • If y tends to increase in magnitude as x increases, r is greater than 0; x any y are said to be positively correlated. (r >0) • If y decreases as x increases, r is less than 0 and the two variables are negatively correlated. (r <0) • If r=0, there is no linear relationship between x and y and the variables are uncorrelated. (r =0) (Figure 17.2 (c)(d)) • Page 401

  8. 17.2 Pearson’s Correlation Coefficient

  9. In this sample: • Strong linear relationship • Negative association: mortality rate decreases in magnitude as percentage of immunization increases • The correlation coefficient merely tells us that a linear relationship exists between two variables; it does not specify whether the relationship is cause-and-effect. • We would also like to be able to draw conclusions about the unknown population correlation  using the sample correlation coefficient r. 17.2 Pearson’s Correlation Coefficient

  10. H0: =0 (No association between X and Y) H1: ≠0 (association between X and Y) • The estimated standard error of r : • The statistic (under H0): • If we assume that the pairs of observations were obtained randomly and both X and Y are normally distribution. • If  is equal to some other value, represented by 0, the sampling distribution is skewed, and the test statistic no longer follow at t distribution. 17.2 Pearson’s Correlation Coefficient

  11. The coefficient of correlation r has several limitations: • It quantifies only the strength of the linear relationship between two variables. • Care must be taken when the data contain any outliers, or pairs of observations that lie considerably outside the range of the other data points. • The estimated correlation should never be extrapolated beyond the observed ranges of the variables; the relationship between X and Y may change outside of this region. • A high correlation between two variables does not imply a cause-and-effect relationship. 17.2 Pearson’s Correlation Coefficient

  12. 17.3Spearman’s Rank Correlation Coefficient • Pearson’s correlation coefficient is very sensitive to outlying values. We may be interested in calculating a measure of association that is more robust. • One approach is to rank the two sets of outcomes x and y separately and known as Spearman’s rank correlation coefficient.(non-parametric method) • Spearman’s rank correlation coefficient: • Where xri and yri are the rank associated the ith subject rather than the actual observations.

  13. An equivalent method for computing rs is provided by • n: the number of data points in the sample • di is the different between the rank of xi and the rank of yi • -1 ≤ rs ≤ 1 • High degree of correlation between x any y: rs =-1 or 1 • A lack of linear association between two variables: rs= 0 • When type of data is ordinal or the conditions do not hold, we should used rs . 17.3 Spearman’s Rank Correlation Coefficient

  14. Spearman’s rank correlation coefficient may also be thought of as a measure of the concordance(一致性) of the ranks for the outcomes x and y. • Case I 17.3 Spearman’s Rank Correlation Coefficient

  15. Case II 17.3Spearman’s Rank Correlation Coefficient

  16. If n is not too small and if we can assume that pairs of ranks are chosen randomly, we can test null hypothesis: H0: =0. The test statistic is • This testing procedure does not require that X and Y be normally distributed. • About rs : • It is much less sensitive to outlying values than Pearson’s correlation coefficient. • It can be used when one or both of the relevant variables are ordinal. • It relies on ranks rather than on actual observations. 17.3 Spearman’s Rank Correlation Coefficient

More Related