1 / 52

Multivariate Data Summary

Multivariate Data Summary. Linear Regression and Correlation. Pearson’s correlation coefficient r. Slope and Intercept of the Least Squares line. r = 0.0. Scatter Plot Patterns. r = +0.7. r = +0.9. r = +1.0. r = -0.7. r = -0.9. r = -1.0. Non-Linear Patterns.

Download Presentation

Multivariate Data Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate DataSummary

  2. Linear Regression and Correlation

  3. Pearson’s correlation coefficient r.

  4. Slope and Intercept of the Least Squares line

  5. r = 0.0 Scatter Plot Patterns r = +0.7 r = +0.9 r = +1.0

  6. r = -0.7 r = -0.9 r = -1.0

  7. Non-Linear Patterns r can take on arbitrary values between -1 and +1 if the pattern is non-linear depending or how well your can fit a straight line to the pattern

  8. TheCoefficient of Determination

  9. An important Identity in Statistics (Total variability in Y) = (variability in Y explained by X) + (variability in Y unexplained by X)

  10. It can also be shown: = proportion variability in Y explained by X. = thecoefficient of determination

  11. Categorical Data Techniques for summarizing, displaying and graphing

  12. The frequency tableThe bar graph Suppose we have collected data on a categorical variable X having k categories – 1, 2, … , k. To construct the frequency table we simply count for each category (i) of X, the number of cases falling in that category (fi) To plot the bar graph we simply draw a bar of height fi above each category (i) of X.

  13. Example In this example data has been collected for n = 34,188 subjects. • The purpose of the study was to determine the relationship between the use of Antidepressants, Mood medication, Anxiety medication, Stimulants and Sleeping pills. • In addition the study interested in examining the effects of the independent variables (gender, age, income, education and role) on both individual use of the medications and the multiple use of the medications.

  14. The variables were: • Antidepressant use, • Mood medication use, • Anxiety medication use, • Stimulant use and • Sleeping pills use. • gender, • age, • income, • education and • Role – • Parent, worker, partner • Parent, partner • Parent, worker • worker, partner • worker only • Parent only • Partner only • No roles

  15. Frequency Table for Age

  16. Bar Graph for Age

  17. Frequency Table for Role

  18. Bar Graph for Role

  19. The pie chart • An alternative to the bar chart • Draw a circle (a pie) • Divide the circle into segments with area of each segment proportional to fi or pi = fi /n

  20. Example • In this study the population are individuals who received a head injury. (n = 22540) • The variable is the mechanism that caused the head injury (InjMech) with categories: • MVA (Motor vehicle accident) • Falls • Violence • Other VA (Other vehicle accidents) • Accidents (industrial accident) • Other (all other mechanisms for head injury)

  21. Graphical and Tabular Display of Categorical Data. • The frequency table • The bar graph • The pie chart

  22. The frequency table

  23. The bar graph

  24. The pie chart

  25. Multivariate Categorical Data

  26. The two way frequency table The c2 statistic Techniques for examining dependence amongst two categorical variables

  27. Situation • We have two categorical variables R and C. • The number of categories of R is r. • The number of categories of C is c. • We observe n subjects from the population and count xij = the number of subjects for which R = i and C = j. • R = rows, C = columns

  28. Example Both Systolic Blood pressure (C) and Serum Chlosterol (R) were meansured for a sample of n = 1237 subjects. The categories for Blood Pressure are: <126 127-146 147-166 167+ The categories for Chlosterol are: <200 200-219 220-259 260+

  29. Table: two-way frequency

  30. Example This comes from the drug use data. The two variables are: • Age (C) and • Antidepressant Use (R) • measured for a sample of n = 33,957 subjects.

  31. Two-way Frequency Table Percentage antidepressant use vs Age

  32. The c2 statistic for measuring dependence amongst two categorical variables Define = Expected frequency in the (i,j) th cell in the case of independence.

  33. Proportion in column j for row i overall proportion in column j Justification

  34. Proportion in row i for column j overall proportion in row i and

  35. The c2 statistic Eij= Expected frequency in the (i,j) th cell in the case of independence. xij= observed frequency in the (i,j) th cell

  36. Example: studying the relationship between Systolic Blood pressure and Serum Cholesterol In this example we are interested in whether Systolic Blood pressure and Serum Cholesterol are related or whether they are independent. Both were measured for a sample of n = 1237 cases

  37. Observed frequencies

  38. Expected frequencies In the case of independence the distribution across a row is the same for each rowThe distribution down a column is the same for each column

  39. Standardized residuals The c2 statistic

  40. Example This comes from the drug use data. The two variables are: • Role (C) and • Antidepressant Use (R) • measured for a sample of n = 33,957 subjects.

  41. Two-way Frequency Table Percentage antidepressant use vs Role

  42. Calculation of c2 The Raw data Expected frequencies

  43. The Residuals The calculation of c2

  44. Example • In this example n = 57407 individuals who had been victimized twice by crimes • Rows = crime of first vicitmization • Cols = crimes of second victimization

More Related