Download
discrete multivariate analysis n.
Skip this Video
Loading SlideShow in 5 Seconds..
Discrete Multivariate Analysis PowerPoint Presentation
Download Presentation
Discrete Multivariate Analysis

Discrete Multivariate Analysis

164 Views Download Presentation
Download Presentation

Discrete Multivariate Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. DiscreteMultivariate Analysis Analysis of Multivariate Categorical Data

  2. References • Fienberg, S. (1980), Analysis of Cross-Classified Data , MIT Press, Cambridge, Mass. • Fingelton, B. (1984), Models for Category Counts , Cambridge University Press. • Alan Agresti (1990) Categorical Data Analysis, Wiley, New York.

  3. Example 1 In this study we examine n = 1237 individuals measuring X, Systolic Blood Pressure and Y, Serum Cholesterol

  4. Example 2 The following data was taken from a study of parole success involving 5587 parolees in Ohio between 1965 and 1972 (a ten percent sample of all parolees during this period).

  5. The study involved a dichotomous response Y • Success (no major parole violation) or • Failure (returned to prison either as technical violators or with a new conviction) based on a one-year follow-up. • The predictors of parole success included are: • type of committed offence (Person offense or Other offense), • Age (25 or Older or Under 25), • Prior Record (No prior sentence or Prior Sentence), and • Drug or Alcohol Dependency (No drug or Alcohol dependency or Drug and/or Alcohol dependency).

  6. The data were randomly split into two parts. The counts for each part are displayed in the table, with those for the second part in parentheses. • The second part of the data was set aside for a validation study of the model to be fitted in the first part.

  7. Table

  8. Multiway Frequency Tables B • Two-Way A

  9. B • Three -Way A C

  10. Three -Way C B A

  11. B A • four -Way C D

  12. Analysis of a Two-way Frequency Table:

  13. Frequency Distribution(Serum Cholesterol and Systolic Blood Pressure)

  14. Joint and Marginal Distributions(Serum Cholesterol and Systolic Blood Pressure) The Marginal distributions allow you to look at the effect of one variable, ignoring the other. The joint distribution allows you to look at the two variables simultaneously.

  15. Conditional Distributions( Systolic Blood Pressure given Serum Cholesterol ) The conditional distribution allows you to look at the effect of one variable, when the other variable is held fixed or known.

  16. Conditional Distributions(Serum Cholesterol given Systolic Blood Pressure)

  17. GRAPH: Conditional distributions of Systolic Blood Pressure given Serum Cholesterol

  18. Notation: Let xij denote the frequency (no. of cases) where X (row variable) is i and Y (row variable) is j.

  19. Different Models The Multinomial Model: Here the total number of cases N is fixed and xij follows a multinomial distribution with parameters pij

  20. The Product Multinomial Model: Here the row (or column) totals Ri are fixed and for a given row i, xij follows a multinomial distribution with parameters pj|i

  21. The Poisson Model: In this case we observe over a fixed period of time and all counts in the table (including Row, Column and overall totals) follow a Poisson distribution. Let mij denote the mean of xij.

  22. Independence

  23. Multinomial Model if independent and The estimated expected frequency in cell (i,j) in the case of independence is:

  24. The same can be shown for the other two models – the Product Multinomial model and the Poisson model namely The estimated expected frequency in cell (i,j) in the case of independence is: Standardized residuals are defined for each cell:

  25. The Chi-Square Statistic The Chi-Square test for independenceReject H0: independence if

  26. TableExpected frequencies, Observed frequencies, Standardized Residuals c2 = 20.85 (p = 0.0133)

  27. Example In the example N = 57,407 cases in which individuals were victimized twice by crimes were studied. The crime of the first victimization (X) and the crime of the second victimization (Y) were noted. The data were tabulated on the following slide

  28. Table 1: Frequencies

  29. Table 2: Expected Frequencies (assuming independence)

  30. Table 3: Standardized residuals

  31. Table 3: Conditional distribution of second victimization given the first victimization (%)

  32. Log Linear Model

  33. Recall, if the two variables, rows (X) and columns (Y) are independent then and

  34. In general let then (1) where Equation (1) is called the log-linear model for the frequencies xij.

  35. Note: X and Y are independent if In this case the log-linear model becomes

  36. Comment: The log-linear model for a two-way frequency table: is similar to the model for a two factor experiment

  37. Three-way Frequency Tables

  38. Example Data from the Framingham Longitudinal Study of Coronary Heart Disease (Cornfield [1962]) Variables • Systolic Blood Pressure (X) • < 127, 127-146, 147-166, 167+ • Serum Cholesterol • <200, 200-219, 220-259, 260+ • Heart Disease • Present, Absent The data is tabulated on the next slide

  39. Three-way Frequency Table

  40. Log-Linear model for three-way tables Let mijk denote the expected frequency in cell (i,j,k) of the table then in general where

  41. Hierarchical Log-linear models for categorical Data The hierarchical principle: If an interaction is in the model, also keep lower order interactions and main effects associated with that interaction For three way tables

  42. 1. Model: (All Main effects model) lnmijk = u + u1(i) + u2(j) + u3(k) i.e. u12(i,j) = u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [1][2][3] Description: Mutual independence between all three variables.

  43. 2. Model: lnmijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) i.e. u13(i,k) = u23(j,k) = u123(i,j,k) = 0. Notation: [12][3] Description: Independence of Variable 3 with variables 1 and 2.

  44. 3. Model: lnmijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) i.e. u12(i,j) = u23(j,k) = u123(i,j,k) = 0. Notation: [13][2] Description: Independence of Variable 2 with variables 1 and 3.

  45. 4. Model: lnmijk = u + u1(i) + u2(j) + u3(k) + u23(j,k) i.e. u12(i,j) = u13(i,k) = u123(i,j,k) = 0. Notation: [23][1] Description: Independence of Variable 3 with variables 1 and 2.

  46. 5. Model: lnmijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) +u13(i,k) i.e. u23(j,k) = u123(i,j,k) = 0. Notation: [12][13] Description: Conditional independence between variables 2 and 3 given variable 1.

  47. 6. Model: lnmijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) +u23(j,k) i.e. u13(i,k) = u123(i,j,k) = 0. Notation: [12][23] Description: Conditional independence between variables 1 and 3 given variable 2.

  48. 7. Model: lnmijk = u + u1(i) + u2(j) + u3(k) + u13(i,k) + u23(j,k) i.e. u12(i,j) = u123(i,j,k) = 0. Notation: [13][23] Description: Conditional independence between variables 1 and 2 given variable 3.

  49. 8. Model: lnmijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) + u23(j,k) i.e. u123(i,j,k) = 0. Notation: [12][13][23] Description: Pairwise relations among all three variables, with each two variable interaction unaffected by the value of the third variable.

  50. 9. Model: (the saturated model) lnmijk = u + u1(i) + u2(j) + u3(k) + u12(i,j) + u13(i,k) + u23(j,k) + u123(i,j,k) Notation: [123] Description: No simplifying dependence structure.