1 / 59

Multilevel Modeling

Multilevel Modeling. Multilevel Question. Turns out the Simple Random Sampling is very expensive Travel to Moscow, Idaho to give survey to a single student.

gorej
Download Presentation

Multilevel Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multilevel Modeling Raul Cruz-Cano, HLTH653 Spring 2013

  2. Multilevel Question • Turns out the Simple Random Sampling is very expensive • Travel to Moscow, Idaho to give survey to a single student. • The subsets are conventionally called primary sampling units or psu's. In a two-stage sample, rst a sample is drawn from the primary sampling units (the rst-stage sample), and within each psu included in the rst-stage sample, a sample of population elements is drawn (the second-stage sample). • This can be extended to situations with more than two levels, e.g., individuals within households within municipalities, and then is called a multistage sample. Raul Cruz-Cano, HLTH653 Spring 2013

  3. These are examples of two-level data structures, but extensions to multiple levels • are possible: 10 cities ->In each city: 5 schools ->In each school: 2 classes ->In each class: 5 students ->Each student given the test twice Raul Cruz-Cano, HLTH653 Spring 2013

  4. What is Multilevel or Hierarchical Linear Modeling? Nested Data Structures Raul Cruz-Cano, HLTH653 Spring 2013

  5. Individuals Undivided Unit of Analysis = Individuals Raul Cruz-Cano, HLTH653 Spring 2013

  6. Individuals Nested Within Groups Unit of Analysis = Individuals + Classes Raul Cruz-Cano, HLTH653 Spring 2013

  7. … and Further Nested Unit of Analysis = Individuals + Classes + Schools Raul Cruz-Cano, HLTH653 Spring 2013

  8. Examples of Multilevel Data Structures • Neighborhoods are nested within communities • Families are nested within neighborhoods • Children are nested within families Raul Cruz-Cano, HLTH653 Spring 2013

  9. Examples of Multilevel Data Structures • Schools are nested within districts • Classes are nested within schools • Students are nested within classes Raul Cruz-Cano, HLTH653 Spring 2013

  10. Multilevel Data Structures Level 4 District (l) Level 3 School (k) Level 2 Class (j) Level 1 Student (i) Raul Cruz-Cano, HLTH653 Spring 2013

  11. 2nd Type of Nesting • Repeated Measures Nested Within Individuals Focus = Change or Growth Raul Cruz-Cano, HLTH653 Spring 2013

  12. Time Points Nested Within Individuals Raul Cruz-Cano, HLTH653 Spring 2013

  13. Nested Data • Data nested within a group tend to be more alike than data from individuals selected at random. • Nature of group dynamics will tend to exert an effect on individuals. Raul Cruz-Cano, HLTH653 Spring 2013

  14. Multilevel Modeling Seems New But…. Extension of General Linear Modeling Simple Linear Regression Multiple Linear Regression ANOVA ANCOVA Repeated Measures ANOVA Raul Cruz-Cano, HLTH653 Spring 2013

  15. Why Multilevel Modelingvs. Traditional Approaches? Traditional Approaches – 1-Level • Individual level analysis (ignore group) • Group level analysis (aggregate data and ignore individuals) Raul Cruz-Cano, HLTH653 Spring 2013

  16. Problems withTraditional Approaches • Individual level analysis (ignore group) Violation of independence of data assumption leading to misestimated standard errors (standard errors are smaller than they should be). Raul Cruz-Cano, HLTH653 Spring 2013

  17. Problems withTraditional Approaches • Group level analysis (aggregate data and ignore individuals) Aggregation bias = the meaning of a variable at Level-1 (e.g., individual level SES) may not be the same as the meaning at Level-2 (e.g., school level SES) Raul Cruz-Cano, HLTH653 Spring 2013

  18. Example: Paired t-test: the average change in DBP is significantly different from zero (p = 0.000951) Unpaired t-test: the average change in DBP is significantly different from zero (p = 0.036) Raul Cruz-Cano, HLTH653 Spring 2013

  19. “Multilevel” Approach • 2 or more levels can be considered simultaneously • Can analyze within- and between-group variability Raul Cruz-Cano, HLTH653 Spring 2013

  20. How Many Levels Are Usually Examined? 2 or 3 levels very common 15 students x 10 classes x 10 schools = 1,500 Raul Cruz-Cano, HLTH653 Spring 2013

  21. Types of Outcomes • Continuous Scale (Achievement, Attitudes) • Binary (pass/fail) • Categorical with 3 + categories Raul Cruz-Cano, HLTH653 Spring 2013

  22. Effect for estimation of a mean • if the sample is a two-stage sample using random sampling with replacement at either stage or if the sampling fractions are so low that the difference between sampling with and sampling without replacement is negligible. Raul Cruz-Cano, HLTH653 Spring 2013

  23. Effect for estimation of a mean • Since considerations for the choice of a design always are of an approximate nature, only those designs are considered here where each level-two unit contains the same number of level-one units. • Level-two units will sometimes be referred to as clusters. The number of level-two units is denoted N • The number of level-one units within each level-two unit is denoted n • These numbers are called the level-two sample size and the cluster size, respectively • The total sample size is Nn. • If in reality the number of level-one units fluctuates between level-two units, it will almost always be a reasonable approximation to use for n the average number of sampled level-one units per level-two unit. Raul Cruz-Cano, HLTH653 Spring 2013

  24. Effect for estimation of a mean • Suppose that the mean is to be estimated of some variable Y in a population which has a two-level structure. As an example, Y could be the duration of hospital stay after a certain operation under the condition that there are no complications or additional health problems. Random Intercept Raul Cruz-Cano, HLTH653 Spring 2013

  25. Effect for estimation of a mean This increase in complexity permeates to regression, etc This is a relatively simple model, more complex models lead to more complex calculations that require the calculation of large covariance matrices

  26. Easier Case The effect of each level-2 unit is a constant (fixed), not a random variable Raul Cruz-Cano, HLTH653 Spring 2013

  27. Fixed Effects An equivalent to this operation is to add a dummy variable for each uj Actually a constants is a random variable with no variation hence fixed effects is special case of random effects

  28. Software to do Multilevel Modeling SAS Users PROC MIXED Extension of General Linear Modeling Simple Linear Regression Multiple Linear Regression ANOVA ANCOVA Repeated Measures ANOVA PROC REG PROC GLM PROC ANOVA Raul Cruz-Cano, HLTH653 Spring 2013

  29. Example: Family and Gender • The response variable Height measures the heights (in inches) of 18 individuals. • The individuals are classified according to Family and Gender data heights; input Family Gender$ Height @@; datalines; 1 F 67 1 F 66 1 F 64 1 M 71 1 M 72 2 F 63 2 F 63 2 F 67 2 M 69 2 M 68 2 M 70 3 F 63 3 M 64 4 F 67 4 F 66 4 M 67 4 M 67 4 M 69 ; run; Different than “Effects…” because now we have more cluster levels, but no random intercepts Raul Cruz-Cano, HLTH653 Spring 2013

  30. Example: Family and Gender • The PROC MIXED statement invokes the procedure. The CLASS statement instructs PROC MIXED to consider both Family and Gender as classification variables. • Dummy (indicator) variables are, as a result, created corresponding to all of the distinct levels of Family and Gender. • For these data, Family has four levels and Gender has two levels. proc mixed data=heights; class Family Gender; model Height = Gender Family Family*Gender/s; run; s : requests that a solution for the fixed-effects parameters be produced along with their approximate standard errors

  31. Family and Gender • Run program simple-proc_mixed2.sas What happens when you try to use the statement CLASS in a PROC REG? Raul Cruz-Cano, HLTH653 Spring 2013

  32. Dorsal shells in lizards Two-sample t-test: the small observed difference is not significant (p = 0.1024). Raul Cruz-Cano, HLTH653 Spring 2013

  33. Mother effect • We have 102 lizards from 29 mothers • Mother effects might be present • Hence a comparison between male and female animals should be based on within-mother comparisons. Raul Cruz-Cano, HLTH653 Spring 2013

  34. Mother effect # of dorsal shells Mother Raul Cruz-Cano, HLTH653 Spring 2013

  35. First Choice Β can be interpreted as the average difference between males and females for each mother Test for a ‘sex’ effect, correcting for ‘mother’ effects, More complex example than “Effect…” because now we have a variable xij for each observation Raul Cruz-Cano, HLTH653 Spring 2013

  36. SAS program proc mixed data = lizard; class mothc; model dors = sex mothc; run; Source F Value Pr > F SEX 7.19 0.0091 MOTHC 3.95 <.0001 Highly significant mother effect. Significant gender effect. Many degrees of freedom are spent to the estimation of the mother effect, which is not even of interest

  37. Later in this semester… • Note the different nature of the two factors: • SEX: defines 2 groups of interest • MOTHER: defines 29 groups not of real interest. A new sample would imply other mothers. • In practice, one therefore considers the factor ‘mother’ as a random factor. • The factor ‘sex’ is a fixed effect. • Thus the model is a mixed model. • In general, models can contain multiple fixed and/or random factors. Fixed Effects Model Random Effects Model Raul Cruz-Cano, HLTH653 Spring 2013 As in the slides of “Effect…”

  38. Later in this semester… • Note the different nature of the two factors: • SEX: defines 2 groups of interest • MOTHER: defines 29 groups not of real interest. A new sample would imply other mothers. • In practice, one therefore considers the factor ‘mother’ as a random factor. • The factor ‘sex’ is a fixed effect. • Thus the model is a mixed model. • In general, models can contain multiple fixed and/or random factors. proc mixed data = lizard; class mothc; model dors = sex / solution; random mothc; run; Raul Cruz-Cano, HLTH653 Spring 2013

  39. Is a variable random or fixed effect? LaMotte 1983, pp. 138–139 • Treatment levels used are the only ones about which inferences are sought => fixed Effect • Inferences are sought about a broader collection of treatment effects than those used in the experiment, or if the treatment levels are not selected purposefully => Random Effect Raul Cruz-Cano, HLTH653 Spring 2013

  40. More terminology • Balanced design • Equal number of observations per unit • Unbalanced design • Unequal number of observation per unit • Unconditional model • Simplest level 2 model; no predictors of the level 1 parameters (e.g., intercept and slope) • Conditional model • Level 2 model contains predictors of level 1 parameters Raul Cruz-Cano, HLTH653 Spring 2013

  41. Weighted Data Problem: Pct. of Voting Population Pct. of People who have a phone Minority Voters Minority Voters White Voters White Voters Solution: Give more “weight” to the minority people with telephone

  42. Weighted Data Not limited to 2 categories Pct. of Voting Population Pct. of People who have a phone Minority/Dem. Minority/Dem. Minority/Rep. Minority/Rep. White /Dem White /Dem White /Rep White /Rep How many categories? As many as there are significant

  43. Proportion Suppose minority voters are 1/3 of the voting population but only 1/6 of the people with phone A sampling weight for a given data point is the number of receipts in the target population which that sample point represents. Needless to say that in reality this is a much more complex issue Raul Cruz-Cano, HLTH653 Spring 2013

  44. Which weight we need to use? • Oversimplified example (don’t take seriously) Pct. of Voting Population in 2008 Pct. of People who have a phone Minority Voters O Minority Voters White Voters Pct. of Voting Population in 2010 White Voters Minority Voters M White Voters

  45. Proportion Suppose minority voters are 1/3 of the voting population but only 1/6 of the people with phone • 100 minority + 500 white answer the phone survey • 75 Minority will vote for candidate X • 250 White will votes for candidate X • Non-Weighted Conclusion: 325/600 =54.16% of the voters will vote for candidate X • Weighted Conclusion: • 75 minority = 75% of minority with phone=>(.75)*(1/6)=12.5% of people with phone * 2 weight= 25% pct of voting population • 250 white = 50% of white people with phone =>(.5)*(5/6)= 41.66% of people with phone * .8 weight =>33.33% • 25% +33.33%=58.33% Raul Cruz-Cano, HLTH653 Spring 2013

  46. SAS Weighted Mean proc means data=sashelp.class; var height; run; proc means data=sashelp.class; weight weight; var height; run; Raul Cruz-Cano, HLTH653 Spring 2013

  47. Fish Measurement Data data Fish1 (drop=HtPct WidthPct); title 'Fish Measurement Data'; input Weight Length3 HtPct WidthPct @@; Weight3= Weight**(1/3); Height=HtPct*Length3/100; Width=WidthPct*Length3/100; datalines; 242.0 30.0 38.4 13.4 290.0 31.2 40.0 13.8 340.0 31.1 39.8 15.1 363.0 33.5 38.0 13.3 430.0 34.0 36.6 15.1 450.0 34.7 39.2 14.2 500.0 34.5 41.1 15.3 390.0 35.0 36.2 13.4 450.0 35.1 39.9 13.8 500.0 36.2 39.3 13.7 475.0 36.2 39.4 14.1 500.0 36.2 39.7 13.3 500.0 36.4 37.8 12.0 . 37.3 37.3 13.6 600.0 37.2 40.2 13.9 600.0 37.2 41.5 15.0 700.0 38.3 38.8 13.8 700.0 38.5 38.8 13.5 610.0 38.6 40.5 13.3 650.0 38.7 37.4 14.8 575.0 39.5 38.3 14.1 685.0 39.2 40.8 13.7 620.0 39.7 39.1 13.3 680.0 40.6 38.1 15.1 700.0 40.5 40.1 13.8 725.0 40.9 40.0 14.8 720.0 40.6 40.3 15.0 714.0 41.5 39.8 14.1 850.0 41.6 40.6 14.9 1000.0 42.6 44.5 15.5 920.0 44.1 40.9 14.3 955.0 44.0 41.1 14.3 925.0 45.3 41.4 14.9 975.0 45.9 40.6 14.7 950.0 46.5 37.9 13.7 ; run; • The data set contains 35 fish from the species Bream caught in Finland's lake Laengelmavesi with the following measurements: • Weight (in grams) • Length3 (length from the nose to the end of its tail, in cm) • HtPct (max height, as percentage of Length3) • WidthPct (max width, as percentage of Length3) title 'Fish Measurement Data'; proc corr data=fish1 nomiss plots=matrix(histogram); var Height Width Length3 Weight3; run; The statement weight can be used by many different PROC’s

  48. Weighted PROC MIXED proc mixed data=sashelp.class covtest; class Sex; model height=Sex Age/solution; weight weight; run; proc mixed data=sashelp.class covtest; class Sex; model height=Sex Age/solution; weight weight; run; Notice the difference (kind of small) in let’s say the coefficients of the model (Solution for Fixed Effects/Estimates) Raul Cruz-Cano, HLTH653 Spring 2013

  49. Farms Example • It's stratified by regions within Iowa and Nebraska. • Regress on farm area, with separate intercept and slope for each state • Farms.sas The population is first partitioned into disjoint classes (the strata) which together are exhaustive. Thus each population element should be within one and only one stratum. The main difference between stratified and cluster sampling is that in stratified sampling all the strata need to be sampled. In cluster sampling one proceeds by first selecting a number of clusters at random and then sampling each cluster or conduct a census of each cluster. But usually not all clusters would be included. Raul Cruz-Cano, HLTH653 Spring 2013

  50. Another (better?) approach for weighted data • Experimental design data have all the properties that we learned about in statistics classes. • The data are going to be independent • Identically-distributed observations with some known error distribution • there is an underlying assumption that the data come to use as a finite number of observations from a conceptually infinite population • Simple random sampling without replacement for the sample data • Sample survey data, • Does not come from a finite target population • The sample survey data do not have independent errors. The sample survey data do not come from a conceptually infinite population. • The sample survey data may cover many small sub-populations, so we do not expect that the errors are identically distributed. Raul Cruz-Cano, HLTH653 Spring 2013

More Related