Nature versus Nurture in the Explanations for Racial/Ethnic Health Disparities

Nature versus Nurture • in the Explanations for • Racial/Ethnic Health Disparities • Jay S. Kaufman, PhD • Dept of Epidemiology, Biostatistics, and Occupational Health • McGill University • Montreal, Quebec • CANADA • 12:00 PM March 7, 2012 • Leacock 429 • The Social Statistics Speaker Series

ScienceDaily (July 13, 2009) — "It was a level playing field for everyone. So our findings cast doubt on a widely accepted theory that African Americans' lower survival rates for certain cancers are solely due to such factors as poverty and poor access to quality health care." Albain's study found no statistically significant association between race and survival for lung cancer, colon cancer, lymphoma, leukemia, or myeloma. The cancers that did show survival gaps -- breast, prostate and ovarian -- are gender-related and the survival disparity persisted after adjustment for treatment factors, tumor variables, and socioeconomic status. The findings therefore suggest that the survival gap for these cancers is most likely due to an interaction of tumor biologic factors, hormonal environment, and inherited variations in genes that control metabolism of drugs, toxins and hormones, Albain said.

Two flavors of epidemiology: ETIOLOGY SURVEILLANCE Statistically: Pr(Y|SET[X=x]) vs Pr(Y|X=x)

Statistical Adjustments: "The philosophers have only interpreted the world, the point, however, is to change it.“ --- Karl Marx If you’re doing descriptive epidemiology, you show a picture of the world as it really is. No "adjustments". Why not? Because the real world is unadjusted. If you’re doing an etiologic (causal) analysis, you must identify what would happen if you intervened on the world in some specific way. In order to figure this out from observational data, you must often adjust statistically for covariates.

AN EXAMPLE: House JS, et al. Excess mortality among urban residents: how much, for whom, and why? Am J Public Health. 2000; 90(12):1898-904.

Which flavor of epidemiology are we having? If our purpose is descriptive (i.e., what is the contrast of hazard rates for different groups in the real world?), there should be no adjustment. If our purpose is etiologic (causal), then we want to know: Pr(Y|SET[X=x1]) versus Pr(Y|SET[X=x2]) where: X = race, sex and residence Y = all-cause mortality hazard SURVEILLANCE ETIOLOGY

Not all covariates are confounders: The causal effect of manipulating smoking is: Pr(Y|SET[X=x1]) versus Pr(Y|SET[X=x2]) If these are the only variables relevant to this problem, the causal effect is estimated without bias by the contrast of the observed probabilities: Pr(Y| X=x1) versus Pr(Y| X=x2) NOT by adjusting for the intermediate variable Z. In fact, the adjusted effect would be null, which is presumably very far from the truth. Z X Y smoking tar deposition in the lungs lung cancer

In the example: Z = education, income, marital status and "health" where "health" is defined by several variables including self-reported health status, health behaviors, and chronic or debilitating conditions. But the covariates listed here are more plausibly affected by exposure (race, sex, residence) than they are confounders of the exposure and disease. Wouldn't you agree that we know a priori that educational and income opportunities are affected by race and gender?

One reasonable causal model: RESIDENCE RACE SEX EDUCATION INCOME MARITAL STATUS "HEALTH" ALL-CAUSE MORTALITY

This may not be exactly right, but clearly the covariates that were adjusted for in the published analysis are primarily causal intermediates, not confounders Therefore, the effect of adjusting is generally to bias the estimated effects away from their true values. The causal estimate is also hindered by ambiguity about the meaning of: Pr(Y|SET[X=x1])  Pr(Y|SET[X=x2]) when X is race or sex, or any other quantity for which the "SET" intervention is vaguely defined. HernánAmer J Epidemiol 2005

But even if you restrict yourself to adjusting for variables that really are confounders, not intermediates, the method applied in this setting is still hopeless. Lets say that there may be any number of measured or unmeasured covariates ( Z ) that are associated with an exposure of interest ( X ) and causally precede the outcome ( Y ). U Z Z Y X X Y

Such variables may confound the observed relation, in the sense that the observed association in the data would not converge to the true causal effect as n  . The true causal effect is defined as the one that would be achieved from an experimental manipulation of X: As stated previously, if confounding can be attributed entirely to covariate(s) Z, then adjustment, for example via standardization or regression, allows for the unbiased estimation of the true causal effect. Which conceptual models are defensible for this methodology? Pr(Y|SET[X=x1])  Pr(Y|SET[X=x2])

An Experimental Model Applied to Disparities Arising from Discrimination: Straightforward, because the experimental intervention is well-defined: Examples: Loring M, Powell B. (1988) Gender, race, and DSM-III: a study of the objectivity of psychiatric diagnostic behavior. J Health Soc Behav; 29: 1-22. Bertrand M, Mullainathan S. (2004) Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American Economic Review; 94(4): 991-1013. Schulman KA, et al. (1999) The effect of race and sex on physicians' recommendations for cardiac catheterization. N EnglJ Med; 340(8): 618-26. Pr(Y|SET[Race = r])

Schulman KA, et al. The effect of race and sex on physicians' recommendations for cardiac catheterization. N Engl J Med. 1999 Feb 25; 340(8):618-26.

An Analytic Model Applied to Disparities Arising from Discrimination: Easy to extend the experimental logic to observational studies, for which statistical manipulation of the observed data is relied upon as a method for estimating what would happen in an experimental scenario. Causal interpretation is the outcome distribution contrast that would be observed under a randomization of race to the case presentations rather than the observed race. Example: Todd KH, Deaton C, D'Adamo AP, Goe L. (2000) Ethnicity and analgesic practice. Annals of Emergency Medicine; 35(1): 11-16.

The Analytic Model Applied to Disparities Arising from Innate Factors (e.g. genes): Obvious problem is that the (hypothetical) intervention is no longer readily definable for intrinsic factors Kaufman JS, Cooper RS. (1999) Seeking causal explanations in social epidemiology. Am J Epidemiol; 150(2):113-20. Pr(Y|SET[X=x])  Pr(Y|X=x) ??????? See also: Kaufman JS. Epidemiologic analysis of racial/ethnic disparities: some fundamental issues and a cautionary example. Social Science and Medicine 2008 Apr;66(8):1659-69.

Return to first example: Albain KS, et al. Racial disparities in cancer survival among randomized clinical trials patients of the Southwest Oncology Group. J Natl Cancer Inst 2009;101(14):984-92. Results for post-menopausal breast cancer: (n white = 3903, n black = 413) Adjusted RR* SES-Adjusted RR** 1.49 1.48 (1.28-1.73)(1.27-1.72) *Adjusted for Age, number of positive lymph nodes ( ≥ 4 vs <4), and tumor size (>5 vs ≤ 5 cm) ** Additional adjustment for “income and education”

Definition of SES: Patients in zip code areas in which the median household income was higher than the overall US median were coded as “high income”; otherwise, “low income”. Examples: Cranston, RI (02920) Median 2010 HH Income: $50,165 % Lower than US Median: 49.8% http://www.zipdatamaps.com/02920 West Warwick RI (02893) Median 2010 HH Income: $49,663 % Higher than US Median: 49.6% http://www.zipdatamaps.com/02893

Definition of SES: Education category was similarly based on the proportion of residents in a zip code area who completed high school. For post-menopausal breast cancer, 32% of observations had no zip code data at all (missing SES varied 27-79% depending on outcome) CONCLUSION: “The findings therefore suggest that the survival gap for these cancers is most likely due to an interaction of tumor biologic factors, hormonal environment, and inherited variations in genes that control metabolism of drugs, toxins and hormones.”

Albain et al 2009 is actually an example of a GOOD article (for example, JNCI has an impact factor of 15) Here is an example of a bad article:

Argument is made entirely by extrapolation: Z is high-dimensional with nearly all values unmeasured. In the Albain et al example, even the measured Z are categorized very crudely and often missing entirely. Authors assert (without any substantive justification) that an adjusted value off the null implies that one of the many unmeasured factors must be a genetic trait that is correlated with race and highly predictive of outcome.

No justification is expected or provided because the association between racial groups and genetic predisposition is reflexive: Example: March 29, 2005 Tuesday, PERSONAL HEALTH; Pg. 8'Diabesity,' a Crisis in an Expanding CountryBy JANE E. BRODY “Genes play a role as well. Some people are more prone to developing Type 2 diabetes than others. The risk is 1.6 times as great for blacks as for whites of similar age. It is 1.5 times as great for Hispanic-Americans, and 2 times as great for Mexican-Americans and Native Americans.

To consider all unmeasured variables and decide that the one important unmeasured variable is a genetic trait requires a strong prior probability on that hypothesis. Is this in any way justifiable? How can one approach this question rationally? 1) The proportion of functional variants that have a substantively important prevalence difference between blacks and whites would be, say, smaller than 0.001. Goldstein DB, Hirschhorn JN. In genetic control of disease, does 'race' matter? Nat Genet 2004; 36(12):1243-4. 2) What is a reasonable prior probability that racial/ethnic groups might differ on some unmeasured social factor that is consequential for disease? ????????????

Where does the biomedical literature stand now? • Observational research on racial discrimination (nurture) rests on a potentially sound inferential foundation, whereas observational research on racial predisposition (nature) relies on a much less secure inferential model. • The preponderance of essentialist interpretations in the biomedical literature indicates that analysts must be thinking (erroneously) that either: “race-specific” alleles are much more common than they actually are, or that: social distinctions between racial/ethnic groups are much more modest than they actually are.

American Heart Journal Volume 108, Issue 3(2) September 1984 Pages 715–723 A Note on the Biologic Concept of Race and its Application in Epidemiologic Research Richard Cooper, MD

p. 718 (1984): How important is the potential racial variation?... Lewontin has estimated that diversity between individuals in a population accounts for 85% of the total species variation, diversity within race accounts for 8.3%, and between-race diversity contributes only 6.3%. Middle East Oceana America East Asia Central/South Asia Africa Europe

p. 719 (1984): “... Blacks in the United States have age- and sex-specific mortality ratios that are 25% to 300% higher than those of whites. These ratios are likewise highly mobile over time. How can they be genetic? Age-adjusted death rates for blacks were 37% higher than for whites in the United States in 1977. The most common fatal illness for which we have a clear-cut racial-genetic explanation is sickle cell disease. In 1977 there were 80,000 excess deaths among blacks compared with whites; 277 deaths among blacks were coded to hemoglobinopathies, or 0.3% of the total excess. We must look for the explanation of the remaining excess mortality primarily in social causes.”

Empirical question is: After 7 years of GWAS studies, can we improve on this estimate? The number of diseases in which we now have strong evidence of a genetic contribution has grown exponentially over time. In fact, some might argue that virtually all large SNP effects are now probably known. If any of these known genetic factors have differential distribution over continental populations (because of selection, drift/founder effects, etc), then they would contribute to observed disparities. 28 years ago, Cooper attributed 0.3% of the US racial disparity to nature and 99.7% to nurture. How has that estimate held up over time?

An obvious caveat: Nature and Nurture are clearly not additive

Decomposition of black-white life expectancy gap by cause of death in 2003 and 2008 Data on population at risk, total number of deaths, cause of death (ICD-10 codes), age, gender, race and Hispanic origin were all obtained from the CDC WONDER website, using the Underlying Cause of Death, 1999-2008 Request Form. [http://wonder.cdc.gov] For the age group <1 year, the population at risk was replaced by the number of live births from the National Vital Statistics Reports for births in year 2003: [National Vital Statistics Reports, Vol. 54, No. 2, September 8, 2005] and for year 2008: [National Vital Statistics Reports, Vol. 59, No.1, December 8, 2010]

Two datasets were merged: 1) the number of deaths broken down by each cause of death (grouped into 24 categories) and for 5-year age groups (19 categories), and 2) the total number of deaths by single-year age groups. The second dataset was used to calculate the life expectancy gap and it was then merged with the first one to decompose the gap by cause of death. Specific ICD-10 causes of death were grouped into 24 broader categories.

Arriaga EE. Measuring and explaining the change in life expectancies. Demography. 1984;21(1):83-96. Arriaga EE, Ruzicka LT, WunschGJ, Kane P. Changing trends in mortality decline during the last decades. Differential mortality: methodological issues and biosocial factors. Oxford, UK: Clarendon Press; 1989:105-29. Harper S, Lynch J, Burris S, Davey Smith G. Trends in the Black-White Life Expectancy Gap in the US, 1983-2003. JAMA. 2007; 297(11): 1224-32.

Causes of death contributing to the gap in life expectancy at birth among non-Hispanic blacks and whites, 2003-2008.

December 1, 2011 365(22):2098-109.

Nature. 2011 Sep 11;478(7367):103-9. “We estimate that there are 116 (95% CI 57–174) independent blood pressure variants with effect sizes similar to those reported here, which collectively can explain 2.2% of the phenotypic variance for SBP and DBP, compared with 0.9% explained by the 29 associations discovered thus far.”

…large-scale meta-analyses with ∼2000 candidate genes in 39 multiethnic population-based studies, case-control studies, and clinical trials totaling 17,418 cases and 70,298 controls. …In summary, large-scale meta-analysis involving a dense gene-centric approach has uncovered additional loci and variants that contribute to T2D risk and suggests substantial overlap of T2D association signals across multiple ethnic groups. The American Journal of Human Genetics 90, 1–16, March 9, 2012

American Sociological Review August 2008 vol. 73 no. 4: 543-568

PPROM: roughly a third of PTB PTB: roughly a third of IM Wang et al PNAS 2006 PAR% = 12% (i.e. roughly1% of IM) Failed to replicate: Obstetrics & Gynecol2011; 117(5): 1078-84 Am J Obstetrics & Gynecol2010; 202(5): 431

APOL1 Ser342Gly missense mutation rs73885319 (G1) APOL1 6 bp deletion rs71785313 (G2) Frequency of the three risk genotypes G1G1, G2G2, & G1G2 combined Rosset Nat Rev Nephrol. 2011 Jun;7(6):313-26. http://alfred.med.yale.edu/alfred/recordinfod.asp?UNID=SI316254V

Nature versus Nurture in the Explanations for Racial/Ethnic Health Disparities