Department of Public Health and Primary Care, Cardiovascular Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK. Mendelian randomization: The use of genetic variants as an instrumental variable for assessing causal associations in observational data.
Department of Public Health and Primary Care,Cardiovascular Epidemiology Unit,Strangeways Research Laboratory, Cambridge, UK
Mendelian randomization:The use of genetic variants as an instrumental variable for assessing causal associations in observational data
Presenting author: Stephen Burgess
Problem: How to assess the causal effect of a factor on an outcome if the data available is observational, not experimental?
Difficulties:Confounding: association between factor of interest and competing risks means that those with different levels of the factor of interest cannot be directly compared.
Reverse causation: the factor may not only affect the outcome, but the outcome may also affect the risk factor.
Instrumental An instrumental variable is a variable which is:
(IVs) 1) associated with the factor of interest (so the instrument defines groups differing in the factor),
2) not associated with any other risk factor (so the instrument gives a fair test),
3) not associated with the outcome conditional on any risk factor (so the effect of the instrument must be via the factor of interest).
These conditions, as shown in the directed acyclic graph (DAG) above, ensure that instrumental variable estimates are not biased by confounding.
Mendelian Genetic variants are ideal candidates to be used as instrumental variables as genes are:
randomization: 1) generally specifically associated with biological factors,
2) determined at conception.
These characteristics motivate use and validity of genetic instrumental variables and ensure estimates are not subject to bias due to reverse causation.
Estimation: If all associations are linear and not subject to interactions, the causal effect of a factor on an outcome can be estimated by the ratio of:
regression coefficient of outcome (Y) on instrument (G)
regression coefficient of factor (X) on instrument (G)
= βGY / βGX = βXY
What is the causal association of lipid levels on coronary heart disease (CHD)?
— observational injurious association of low density cholesterol (LDL-C) and protective association of high density cholesterol (HDL-C) on CHD
If richer, healthier people have decreased intake of LDL-C, then this may simply mean that richer, healthier people have lower incidence of CHD. LDL-C may be a marker of good health, not a cause.
If people with poor coronary health decrease their intake of LDL-C in response to subclinical disease (early warning signs of disease), then an association between LDL-C and CHD will be induced.
Suppose there is a common genetic variant which causes the body to retain more LDL-C from the diet, dividing the population into absorbers and non-absorbers.
We see from the diagram that the groups defined by the instrumental variable are similar to arms in a randomized controlled trial.
Assumptions for analysis:
We assume that the instrument is only associated with lipid levels. This analysis would be invalid if, for example:
– the genetic variant was correlated with another variant associated with, say, triglyceride levels.
Factor of interest
Factor of interest
Competing risk factors
Factor of interest
1): association between
instrument and factor
3): no direct association between instrument and outcome
2): no association between
instrument and competing risks
All other factors
equal between groups
Current If cross-sectional data is available on a number of different factors, each of which has an associated instrumental variable, how can the network of
work: associations between the factors be efficiently estimated?
For example, if we are interested in the causal effect of lipid levels on CHD, and have measured instruments which affect LDL-C, HDL-C and triglycerides, how would we estimate a causal association? What if we believe that LDL-C levels may affect triglyceride levels? Could we estimate a direct effect of LDL-C on CHD, or an indirect effect of the increase in LDL-C on CHD via triglycerides? How would you account for structural uncertainty in the model?
Take-home Current methods for instrumental variable analysis enable causal effects to be estimated in a limited and often unrealistic context, where an
message: instrumental variable is only associated with a single factor. More sophisticated methodology is required to estimate causal effects in a more realistic situation, where a range of instruments are associated with a range of interacting factors.
Such analysis requires detailed cross-sectional observational and genetic data, and lots of it!
Contact details — E: [email protected], T: 01223 740002