1 / 90

Multivariate data

Multivariate data. Graphical Techniques. The scatter plot The two dimensional Histogram. Some Scatter Patterns. Non-Linear Patterns. Measures of strength of a relationship (Correlation). Pearson’s correlation coefficient (r) Spearman’s rank correlation coefficient (rho, r ).

terra
Download Presentation

Multivariate data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate data

  2. Graphical Techniques • The scatter plot • The two dimensional Histogram

  3. Some Scatter Patterns

  4. Non-Linear Patterns

  5. Measures of strength of a relationship (Correlation) • Pearson’s correlation coefficient (r) • Spearman’s rank correlation coefficient (rho, r)

  6. Pearsons correlation coefficient is defined as below:

  7. where:

  8. Properties of Pearson’s correlation coefficient r • The value of r is always between –1 and +1. • If the relationship between X and Y is positive, then r will be positive. • If the relationship between X and Y is negative, then r will be negative. • If there is no relationship between X and Y, then r will be zero. • The value of r will be +1 if the points, (xi, yi) lie on a straight line with positive slope. • The value of r will be +1 if the points, (xi, yi) lie on a straight line with positive slope.

  9. r =1

  10. r = 0.95

  11. r = 0.7

  12. r = 0.4

  13. r = 0

  14. r = -0.4

  15. r = -0.7

  16. r = -0.8

  17. r = -0.95

  18. r = -1

  19. Computing formulae for the statistics:

  20. Spearman’s rank correlation coefficient r(rho)

  21. Spearman’s rank correlation coefficient r(rho) Spearman’s rank correlation coefficientis computed as follows: • Arrange the observations on X in increasing order and assign them the ranks 1, 2, 3, …, n • Arrange the observations on Y in increasing order and assign them the ranks 1, 2, 3, …, n. • For any case (i) let (xi, yi) denote the observations on X and Y and let (ri, si) denote the ranks on X and Y.

  22. If the variables X and Y are strongly positively correlated the ranks on X should generally agree with the ranks on Y. (The largest X should be the largest Y, The smallest X should be the smallest Y). • If the variables X and Y are strongly negatively correlated the ranks on X should in the reverse order to the ranks on Y. (The largest X should be the smallest Y, The smallest X should be the largest Y). • If the variables X and Y are uncorrelated the ranks on X should randomly distributed with the ranks on Y.

  23. Spearman’s rank correlation coefficient is defined as follows: For each case let di = ri – si = difference in the two ranks. Then Spearman’s rank correlation coefficient (r) is defined as follows:

  24. Properties of Spearman’s rank correlation coefficient r • The value of r is always between –1 and +1. • If the relationship between X and Y is positive, then r will be positive. • If the relationship between X and Y is negative, then r will be negative. • If there is no relationship between X and Y, then r will be zero. • The value of r will be +1 if the ranks of X completely agree with the ranks of Y. • The value of r will be -1 if the ranks of X are in reverse order to the ranks of Y.

  25. Example xi 25.0 33.9 16.7 37.4 24.6 17.3 40.2 yi 24.3 38.7 13.4 32.1 28.0 12.5 44.9 Ranking the X’s and the Y’s we get: ri 4 5 1 6 3 2 7 si 3 6 2 5 4 1 7 Computing the differences in ranks gives us: di 1 -1 -1 1 -1 1 0

  26. Computing Pearsons correlation coefficient, r, for the same problem:

  27. To compute first compute

  28. Then

  29. and Compare with

  30. Comments: Spearman’s rank correlation coefficient r and Pearson’s correlation coefficient r • The value of r an also be computed from: • Spearman’s ris Pearson’s r computed from the ranks.

  31. Spearman’s r is less sensitive to extreme observations. (outliers) • The value of Pearson’s r is much more sensitive to extreme outliers. This is similar to the comparison between the median and the mean, the standard deviation and the pseudo-standard deviation. The mean and standard deviation are more sensitive to outliers than the median and pseudo- standard deviation.

  32. Simple Linear Regression Fitting straight lines to data

  33. The Least Squares Line The Regression Line • When data is correlated it falls roughly about a straight line.

  34. In this situation wants to: • Find the equation of the straight line through the data that yields the best fit. The equation of any straight line: is of the form: Y = a + bX b = the slope of the line a = the intercept of the line

  35. Rise = y2-y1 Run = x2-x1 y2-y1 Rise b = = Run x2-x1 a

  36. a is the value of Y when X is zero • b is the rate that Y increases per unit increase in X. • For a straight line this rate is constant. • For non linear curves the rate that Y increases per unit increase in X varieswith X.

  37. Linear

  38. Non-linear

  39. Example: In the following example both blood pressure and age were measure for each female subject. Subjects were grouped into age classes and the median Blood Pressure measurement was computed for each age class. He data are summarized below:

  40. Graph:

  41. Interpretation of the slope and intercept • Intercept – value of Y at X = 0. • Predicted Blood pressure of a newborn (65.1). • This interpretation remains valid only if linearity is true down to X = 0. • Slope – rate of increase in Y per unit increase in X. • Blood Pressure increases 1.38 units each year.

  42. The Least Squares Line Fitting the best straight line to “linear” data

  43. Reasons for fitting a straight line to data • It provides a precise description of the relationship between Y and X. • The interpretation of the parameters of the line (slope and intercept) leads to an improved understanding of the phenomena that is under study. • The equation of the line is useful for prediction of the dependent variable (Y) from the independent variable (X).

  44. Assume that we have collected data on two variables X and Y. Let (x1, y1) (x2, y2) (x3, y3) … (xn, yn) denote thepairs of measurements on the on two variables X and Y for n cases in a sample (or population)

  45. Let Y = a + b X denote an arbitrary equation of a straight line. a and b are known values. This equation can be used to predict for each value of X, the value of Y. For example, if X = xi (as for the ith case) then the predicted value of Y is:

  46. For example if Y = a + b X = 25.2 + 2.0 X Is the equation of the straight line. and if X = xi = 20 (for the ith case) then the predicted value of Y is:

More Related