1 / 19

Scottish Social Survey Network: Master Class 1 Data Analysis with Stata

Scottish Social Survey Network: Master Class 1 Data Analysis with Stata. Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling The SSSN is funded under Phase II of the ESRC Research Development Initiative. Multilevel data and analysis with Stata (in 15 minutes).

oshin
Download Presentation

Scottish Social Survey Network: Master Class 1 Data Analysis with Stata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scottish Social Survey Network: Master Class 1Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23rd January 2008, University of Stirling The SSSN is funded under Phase II of the ESRC Research Development Initiative

  2. Multilevel data and analysis with Stata (in 15 minutes)

  3. Generalised linear model • Y = BX + e • Y = outcome variable(s) • X = explanatory variables • e = error term for each individual response Generalised linear mixed models • Adding complexity to the GLM, such as by disaggregating the error structures

  4. The work of statistical modelling • Yi = BXi + ei • Most of the time: • we have a single Y • we ignore e • we concentrate on what goes into B

  5. Example • Data: British Household Panel Survey 2005 adult interviews (7k adults in work) • Y = GHQ scale score for adults in employment (General Health Questionnaire, higher = worse subjective well-being) • X = various possible measures, including gender, age, marital status, occupational advantage, education, partner’s GHQ • You can run this example, the files are at:

  6. Results from four linear models

  7. Some regression assumptions • All variables are measured without errors • All relevant predictors of the independent variable are included in the analysis • Expected value of the error is zero • Heteroscedasticity of the error • No autocorrelation (no relation between error terms for different cases) • [above using: Menard, S. 1995. Applied Logistic Regression Analysis, London: Sage.]

  8. Multilevel modelling • What if there was some connection between some of the cases within the dataset? • This occurs by design in certain projects • e.g. educational research, sample includes multiple children from the same school • Some connections (‘hierarchical clusters’) are standard in most social surveys

  9. How to account for hierarchy / clustering in individual data? • We could try a unique dummy var. for every cluster • Country: Y = BX + scot + wal + Nir + e • ‘areg’ in Stata allows several hundred variables like this • often called a ‘hierarchical fixed effect’ • but many hierarchies have too many clusters for this to be satisfactory • We could use higher level explanatory variables • e.g. average unemployment rate in local authority district • these are also ‘hierarchical fixed effects’ • We could try telling the model that we expect the error terms to be related • these are ‘hierarchical random effects’ = multilevel models

  10. Creating a multilevel model • Linear model: Yi = BXi + ei • Multilevel model (‘random intercepts’) Yij = BXij + uj + eij • Multilevel model (‘random coefficients’) Yij = BXij + UBj + uj + eij

  11. How to implement multilevel models? • In SPSS and Stata, there are extension specifications which can be made in order to specify the simplest random intercepts model

  12. Stata examples • regress ghq fem age age2 cohab • regress ghq fem age age2 cohab, robust cluster(ohid) • xtmixed ghq fem age age2 cohab ||ohid:

  13. Comments • Models which ignore clustering should be unbiassed but inefficient • The simplest multilevel model: • Shouldn’t change coefficent estimates (unbiased) • Should change confidence intervals (inefficient)

  14. 3-level model in Stata (xtmixed)

  15. The same model in MLwiN

  16. A controversial claim about Stata • Stata is the best package to use for multilevel modelling, because: • It is integrated with data management capacity: easy to change variables; change cases; add higher level explanatory variables; etc • It has a wide range of hierarchical model estimators • It allows easy comparison between long-standing hierarchical estimators (from economics) and new random effects models • By constrast: • Other mainstream packages don’t have adequate range of model estimators • Specialist packages (e.g. MLwiN; HLM) do have more advanced modelling estimators, but they inhibit data manipulation / serious model building

More Related