- 110 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'MULTILEVEL ANALYSIS' - trella

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### MULTILEVEL ANALYSIS

### Back to unemployment example

Kate Pickett

Senior Lecturer in Epidemiology

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Perspective

- Health researchers:
- Are interested in answering research questions (not maths)
- Want to be able to apply statistical techniques
- Want to be able to interpret results
- Want to be able to communicate with consumers and statisticians

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Aims for this session

- Understand the rationale for multilevel analysis
- Understand common terminology
- Interpret output from multilevel models
- Be able to read and critically appraise studies using multilevel models

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Context and composition

- Studying populations (groups) and individuals

From Rose, G. Sick individuals and sick populations. Int J Epidemiol 1985;14:32-38

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Levels of analysis

- Health researchers may collect and use data collected at the level of:
- Individuals, patients
- Families or other social groupings
- Clinics or hospitals
- Small areas, neighbourhoods
- Large populations

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Population A

Population B

How is Population A different from Population B?

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Ecological studies

- Data are aggregated and represent a group, rather than an individual
- incidence rate of an illness
- prevalence of a particular health service
- We don’t know which particular individuals within the group were ill or received the service
- These group-based outcome measures are analyzed by correlating them with determinants measured for the same groups

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Source: Pickett KE, Kelly S, Brunner E, Lobstein T, Wilkinson RG. Wider income gaps, wider waistbands? An ecological study

of obesity and income inequality. J Epidemiol Community Health 2005;59:670–674.

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

The ecological fallacy

- Associations at the group level may not hold at an individual level
- Eg, we might see that rates of obesity are correlated internationally with per capita calorie intake
- But, we don’t know if it is the obese individuals who are eating all the calories
- Many group-level variables are correlated so we may get spurious correlations
- Eg, obesity rates may also be correlated with number of zoos per capita or some other completely unrelated factor

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

The atomistic fallacy

- But the ecological fallacy has a flip side
- Factors that affect outcomes in individuals may not operate in the same way at the population level
- Eg, teenage births are more common among the poor, but teenage birth rates are very high in some very wealthy countries.

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Example of teenage births

Source: Pickett KE, Mookherjee S, Wilkinson RG. Adolescent Birth Rates,Total Homicides, and Income Inequality In Rich Countries, AJPH

2005;95:1181-1183.

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Ecological variables

- Sometimes ecological studies are done because it is quick and easy
- Sometimes ecological studies are the best design for the research question

BECAUSE

- Some determinants are “ecological”:
- Population density
- Air quality/pollution
- GNP
- Income inequality
- % unemployed
- Ambient temperature

Context and composition

- But what if we are interested in both types of variables (individual and population) simultaneously?
- Eg: we might want to know about the effect of population-level unemployment on health, above and beyond the health impact of being unemployed for any given individual

Introduction to multilevel models

- Hierarchical models
- Mixed effects models
- Random effects models

Background

- Developed in education research
- Observations of students in a single class are not independent of one another
- “Standard” statistical models assume that observations are independent

- Two-level hierarchy
- Students within classes
- Three-level hierarchy
- Students within classes within schools
- Four-level hierarchy
- Students within classes within schools within local authority areas

Health research context

- Patients within a medical practice
- Residents within neighbourhoods
- Subjects within trial clusters
- Hospitals within PCTs….

Examples for class

- Some examples are drawn from Twisk JWR “Applied Multilevel Analysis” Cambridge University Press, 2006
- Example data are available at: http:\www.emgo.nl\researchtools
- Research question: what is the relationship between total cholesterol and age?
- Statistical software: Stata but note that MLwiN is free to UK academics: http://www.cmm.bristol.ac.uk/MLwiN/download/index.shtml)

Simple linear regression

Total cholesterol = β0 + β1 x age + ε

Simple linear regression, adding a categorical variable

Total cholesterol = β0 + β1 x age + β2 x gender +ε

Simple linear regression, adding another variable (doctor)

Total cholesterol = β0 + β1 x age + β2 x MD1 + β3 x MD2+ β4 x MD3+ β5 x MD4+…..+ βm x MDm-1+ ε

Multilevel analysis

- Instead of estimating all those separate intercepts, we estimate the variance of them
- In our example that means estimating 1 additional parameter, rather than 11
- We are allowing the intercept to be random (random effects modelling)
- An efficient way of correcting for a variable with many categories
- Trade-off:
- Assumes that the different intercepts are normally distributed

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Example data

Cholesterol Dataset

- 441 patients
- Age 44-86 years
- Cholesterol 3.90-8.86 mmol/l
- 12 doctors

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Non-multilevel regression

. regress cholesterol age

Source | SS df MS Number of obs = 441

-------------+------------------------------ F( 1, 439) = 142.06

Model | 99.3395851 1 99.3395851 Prob > F = 0.0000

Residual | 306.984057 439 .699280312 R-squared = 0.2445

-------------+------------------------------ Adj R-squared = 0.2428

Total | 406.323642 440 .923462822 Root MSE = .83623

------------------------------------------------------------------------------

cholesterol | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | .0512619 .0043009 11.92 0.000 .042809 .0597148

_cons | 2.798691 .268571 10.42 0.000 2.270847 3.326536

------------------------------------------------------------------------------

Example using Stata

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

. xtmixed cholesterol age ||doctor:, ml var

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -404.68939

Iteration 1: log likelihood = -404.68939

Computing standard errors:

Mixed-effects ML regression Number of obs = 441

Group variable: doctor Number of groups = 12

Obs per group: min = 36

avg = 36.8

max = 39

Wald chi2(1) = 262.76

Log likelihood = -404.68939 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

cholesterol | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | .0495866 .003059 16.21 0.000 .0435911 .0555822

_cons | 2.905812 .259134 11.21 0.000 2.397919 3.413705

------------------------------------------------------------------------------

------------------------------------------------------------------------------

Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]

-----------------------------+------------------------------------------------

doctor: Identity |

var(_cons) | .3685781 .1541985 .1623381 .8368327

-----------------------------+------------------------------------------------

var(Residual) | .3314923 .0226341 .2899706 .3789597

------------------------------------------------------------------------------

LR test vs. linear regression: chibar2(01) = 282.37 Prob >= chibar2 = 0.0000

Multilevel

Model in

Stata

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Do we need the multilevel model?

- Likelihood ratio test:
- Compare -2 log likelihood of model with random intercept to -2 log likelihood of ordinary linear model
- Difference has a Chi-square distribution with df = difference in number of parameters estimated
- Difference = 284.73, highly significant

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Model parameters

- Effects of age in each model:
- Coefficient in ordinary model = 0.0513
- Coefficient in multilevel model = 0.0496
- 95% CI in ordinary model (0.0428, 0.0597)
- 95% CI in multilevel model (0.0435,0.0556)
- Age is significant in both models

Intraclass correlation coefficient

- This measures how dependent the observations are within clusters
- Eg, how correlated are the observations of patients belonging to the same doctor?
- Defined as:
- Variance between clusters/Total variance
- The smaller the variance within clusters, the greater the ICC

ICC (b)

ICC is low because:

Variance within groups is high (9)

Variance between groups is low (1)

Numerator is small, relative to denominator

ICC = 1/10=0.1

ICC (c)

The groups are now more spread out, more different, and:

ICC is bigger because:

Variance within groups is lower (5)

Variance between groups is higher (5)

ICC=5/10 = 0.5

ICC (d)

The groups are now completely different, and:

ICC is maximised because:

Variance within groups is minimal (1)

Variance between groups is maximal (9)

Numerator is large, relative to denominator

ICC=9/10 = 0.9

MUCH MORE DEPENDENCE WITHIN CLUSTER – each observation provides less unique information

Impact on significance tests

Table of alpha values under different conditions of sample size and ICC

ICC in our example

- ICC = between doctor variance/total variance
- ICC = 0.3686/(0.3686+0.3315)

= 0.3686/0.7001

= 0.526

52.6% of the total individual differences in cholesterol are at the doctor level

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

ICC

- When ICC is high
- Evidence of a contextual effect on the outcome
- Evidence of differences in composition between the clusters
- Explore by including explanatory variables at each level
- When ICC is low
- No need for a multilevel analysis

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Data Structure

Population B

Population A

Red = unemployed

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

An ordinary regression model

Health =b0 + b1 (unemployed) + b2 (% unemployed) + e

e represents the effect of all omitted variables and measurement error and is assumed to have a random effect (so it gets ignored)

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Data Structure

Population B

Population A

Aside from unemployment, subjects in A are different from

B in other ways: composition (shape, size), context (density)

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

A multi-level regression model

i = individual, j=context:

yij = bxij + BXi + Ej + eij

Health = b (unemployedij) + B(% unemployedi) +Ej + eij

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

What does this mean for critical appraisal of the health literature?

- When data are hierarchical or multi-level by nature, they should be analysed appropriately
- The coefficients or odds ratios from the models can be interpreted as usual

- The ICC shows how much variance in the outcome occurs between the higher-level contexts
- If appropriate methods are not used, standard errors and significance tests may be wrong and coefficients biased

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

A summary

- Ecological studies
- Appropriate when the research question concerns only ecological effects
- Ecological fallacy may be a problem
- Individual-level studies
- Appropriate when the research question concerns only individual-level effects
- Atomistic fallacy may be a problem
- Multi-level studies
- Appropriate when the research question concerns both context and composition of populations

SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.pptUniversity of York

Download Presentation

Connecting to Server..