1 / 36

Introduction to Multivariate Analysis

Introduction to Multivariate Analysis. Epidemiological Applications in Health Services Research. Dr. Ibrahim Awad Ibrahim. Areas to be addressed today. Introduction to variables and data Simple linear regression Correlation Population covariance Multiple regression Canonical correlation

Download Presentation

Introduction to Multivariate Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Multivariate Analysis Epidemiological Applications in Health Services Research Dr. Ibrahim Awad Ibrahim.

  2. Areas to be addressed today • Introduction to variables and data • Simple linear regression • Correlation • Population covariance • Multiple regression • Canonical correlation • Discriminant analysis • Logistic regression • Survival analysis • Principal component analysis • Factor analysis • Cluster analysis

  3. Types of variables (Stevens’ classification, 1951) • Nominal • distinct categories: race, religions, counties, sex • Ordinal • rankings: education, health status, smoking levels • Interval • equal differences between levels: time, temperature, glucose blood levels • Ratio • interval with natural zero: bone density, weight, height

  4. Variables use in data analysis • Dependent: result, outcome • developing CHD • Independent: explanatory • Age, sex, diet, exercise • Latent constructs • SES, satisfaction, health status • Measurable indicators • education, employment, revisit, miles walked

  5. Variables in data example

  6. Data • Data screening and transformation • Normality • Independence • Correlation (or lack of independence)

  7. Variable types and measures of central tendency • Nominal: mode • Ordinal: median • Interval: Mean • Ratio: Geometric mean and harmonic mean

  8. Simple linear regression Y = A + BX Y B A X

  9. Correlation • Mean = • Variance (SD)2 =  • Population covariance = (X-  x)(Y-  y) • Product moment coefficient= =xy/ x y • It lies between -1 and 1

  10. Example physical and mental health indicators

  11. Negative correlation

  12.  =0.00 Population covariance  =0.33  =0.6  =0.88

  13. Multiple regression and correlation Simple linear Y =  + X Multiple regression Y =  + 1X1 + 2X2 + 3X3 . . .+ pXp EF ejection fraction Exercise Body fat

  14. Issues with regression • Missing values • random • pattern • mean substitution and ML • Dummy variables • equal intervals! • Multicollinearity • independent variables are highly correlated • Garbage can method

  15. Canonical correlation • An extension of multiple regression • Multiple Y variables and multiple X variables • Finding several linear combinations of the X var and the same number of linear combinations of the Y var. • These combinations are called canonical variables and the correlations between the corresponding pairs of canonical variables are called CANONICALCORRELATIONS

  16. Correlation matrix • Data screening and transformation • Normality • Independence • Correlation (or lack of independence)

  17. Discriminant analysis • A method used to classify an individual in one of two or more groups based on a set of measurements • Examples: • at risk for • heart disease • cancer • diabetes, etc. • It can be used for prediction and description

  18. Discriminant analysis • a and b are wrongly classified • discriminant function to describe the probability of being classified in the right group. B B a b A A

  19. Logistic regression • An alternative to discriminant analysis to classify an individual in one of two populations based on a set of criteria. • It is appropriate for any combination of discrete or continuous variables • It uses the maximum likelihood estimation to classify individuals based on the independent variable list.

  20. Survival analysis (event history analysis) • Analyze the length of time it takes a specific event to occur. • Time for death, organ failure, retirement, etc. • Length of time function of {explanatory variables (covariates)}

  21. 1980 1985 1990 Survival data example died died died lost surviving

  22. Log-linear regression • A regression model in which the dependent variable is the log of survival time (t) and the independent variables are the explanatory variables. Multiple regression Y =  + 1X1 + 2X2 + 3X3 . . .+ pXp Log (t) =  + 1X1 + 2X2 + 3X3 . . .+ pXp + e

  23. t 1980 1985 1990 Cox proportional hazards model • Another method to model the relationship between survival time and a set of explanatory variables. • Proportion of the population who die up to time (t) is the lined area

  24. Cox proportional hazards model • The hazard function (h) at time (t) is proportional among groups 1 & 2 so that • h1(t1)/h2(t2) is constant.

  25. Principal component analysis • Aimed at simplifying the description of a set of interrelated variables. • All variables are treated equally. • You end up with uncorrelated new variables called principal components. • Each one is a linear combination of the original variables. • The measure of the information conveyed by each is the variance. • The PC are arranged in descending order of the variance explained.

  26. Principal component analysis • A general rule is to select PC explaining at least 5% but you can go higher for parsimony purposes. • Theory should guide this selection of cutoff point. • Sometimes it is used to alleviate multicollinearity.

  27. Factor analysis • The objective is to understand the underlying structure explaining the relationship among the original variables. • We use the factor loading of each of the variables on the factors generated to determine the usability of a certain variable. • It is guided again by theory as to what are the structures depicted by the common factors encompassing the selected variables.

  28. Factor analysis

  29. Factor analysis

  30. Cluster analysis • A classification method for individuals into previously unknown groups • It proceeds from the most general to the most specific: • Kingdom: Animalia Phylum: Chordata Subphylum: vertebrata Class: mammalia Order: primates Family: hominidae Genus: homo Species: sapiens

  31. Patient clustering • Major: patients Types: medical Subtype: neurological Class: genetic Order: lateonset disease: Guillian Barre syndrom • Hierarchical: divisive or agglumerative

  32. Conclusions

  33. Presentation Schedule • 4 each on 4/22 and 4/27 • 5 on 4/29 • Each presentation should be maximum of 10 minutes and 5 minutes for discussion • E-mail me your requirements of software and hardware for your presentation. • Final projects due 5/7/99 by 5:00 pm in my office.

  34. Presentation Schedule 1

  35. Presentation Schedule 2

  36. Presentation Schedule 3

More Related