1 / 69

Evolution as a Confounding Factor in Genetic Association Studies

Evolution as a Confounding Factor in Genetic Association Studies. 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center. Current projects. Outline.

korbin
Download Presentation

Evolution as a Confounding Factor in Genetic Association Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution as a Confounding Factor in Genetic Association Studies 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center

  2. Current projects

  3. Outline • Hypothesis that evolution should be considered a confounding factor in genetic association studies • HLA-mediated autoimmune disease predisposition analysis using SFVT • Identifying genetic determinants of influenza species jump events based on convergent evolution • Novel general strategies for formally controlling for evolution as a confounding factor in genetic association studies

  4. evolution AS a confounding factor in genetic association studies

  5. Population-based genetic association • Many diseases exhibit evidence of genetic predispositions • Genotype-phenotype association studies • Diagnostic biomarker • Molecular underpinnings of disease pathology • GWAS and linkage disequilibrium • Co-inheritance of “linked” genetic markers • Advantage of using SNPs to detect causal variants • NGS could obviate the need for using linked SNPs

  6. Statistical assumptions • Independence (confounding) • Random sampling (bias) • Population has reached equilibrium • Test sample represents a random sampling of the equilibrium population

  7. HLA-mediated autoimmune disease predisposition analysis using SFVT

  8. Class I and II Peptide Sources

  9. HLA and autoimmune disease Robbins Pathologic Basis of Disease 6th Edition (1999)

  10. HLA and infectious disease • Correlation between HLA genotype and HIV viral burden and progression to AIDS • M Dean, M Carrington and SJ O'Brien Annual Review of Genomics and Human Genetics Vol. 3: 263-292 (2002)

  11. HLA and drug sensitivity HLA allele drug sensitivity association prevalence B*1502 carbamazepine (epilepsy) p = 3 x 10-27 high Chinese absent Caucasians B*5701 abacavir (HIV) p = 5 x 10-20 high Caucasians absent in Africans, Hispanics B*5801 allopurinol (gout) p = 5 x 10-24 high Chinese P. Parham

  12. Number of HLA Alleles HLA-A HLA-B HLA-C 697 (24) 1109 (49) 381 (9) HLA-DRB HLA-DQA1 HLA-DQB1 HLA-DPA1 HLA-DPB1 690 (20) 34 95 (7) 27 131 MICA MICBTAP 65 30 11 Figures in parenthesis indicate the number of serologically defined antigens at each locus. 500 new submission each year. IMGT HLA - October 2008

  13. Locus Asterisk Allele family (serological where possible) Amino acid difference Non-coding (silent) polymorphism Intron, 3’ or 5’ polymorphism N = null L = low S = Sec. A = Abr. Q = Quest. HLA Allele Nomenclature HLA - A * 24 02 01 01 HLA - A * 24 02 01 02 L

  14. DRB1 phylogeny DRB1*15 DRB1*16 DRB1*04 DRB1*10 DRB1*09 DRB1*07

  15. DRB1 phylogeny DRB1*13 DRB1*13 DRB1*13 DRB1*13 DRB1*13 DRB1*13

  16. DRB1 phylogeny DRB1*15 DRB1*16 DRB1*04 DRB1*10 DRB1*09 DRB1*07

  17. DRB1 alignment 07/15 07/09 09/15

  18. Limitations with traditional HLA allele-based association studies • Treats entire allele as a single unit and therefore includes both causative and passenger variations • Doesn’t take into account structural relationships between alleles • Syntax of the HLA nomenclature was designed to capture some of the structural relationships between alleles, but there are several exceptions

  19. HLA–mediated disease predisposition • Hypothesis: • While the allelic/haplotypic structures reflect evolutionary history of the locus, it is the focused regions in the HLA genes/proteins that affect gene expression, protein structure and/or protein function that are responsible for enhanced disease risk

  20. An alternative approach • DAIT-Data Interoperability Steering Committee/HLA Working Group members • HLA Nomenclature : WHO/ IMGT – HLA/ Anthony Nolan Research Institute • NCBI - dbMHC • Biomedical ontology people

  21. Summary of SFVT approach • Define individual sequence features (SF) in HLA proteins (genes) • Determine the extent of polymorphism for each sequence feature by defining the observed variant types (VT) • Re-annotate HLA typing information with complete list of VT for each SF • Examine the association between every sequence feature variant type and disease or other phenotype

  22. Representative Sequence Features

  23. A*0201 - ‘peptide binding’ SF

  24. A*0201 - ‘peptide binding pocket B’

  25. TCR Binding CD8 Binding A*0201 - ‘CD8 binding’ &‘TCR binding’ SF

  26. Summary of SFs defined 1775 total

  27. Variant Types for Hsa_HLA-DRB1_beta-strand 2_peptide antigen binding

  28. Representative Sequence Features Variant Types

  29. HLA SFVT Association with Systemic Sclerosis • Summary of data set • Systemic sclerosis (SSc, scleroderma) is a chronic condition characterized by altered immune reactivity, thickened skin, endothelial dysfunction, interstitial fibrosis, gangrene, pulmonary hypertension, gastrointestinal tract dysmotility, and renal arteriolar dysfunction. • A large cohort of ~1300 SSc patients and ~1000 healthy controls has been assembled by Drs. Frank C. Arnett, John Reveille and colleagues at the University of Texas Health Science Center at Houston. • Information on autoantibody reactivity for over 15 nuclear antigens is available. • 4-digit typing has been done for DRB1, DQA1, and DQB1 in all individuals. • Initial re-annotation of 4 digit DRB1 typing data • DRB1*1104 => SF1_VT43; SF2_VT4; SF3_VT12 ……… • Statistical analysis • Split data set into two - pseudo-replicates • 2 xn contingency table for every SF (286), where n = number of VT • Chi-squared or Fisher’s Exact Test analysis • Select SF with adjusted p-value <0.01 (83/286) • 2 x 2 contingency table (type vs non-type) for every VT (418 total) • Merge results of pseudo-replicates

  30. DRB1*0101 Visualization

  31. Composite SF - Risk and Protective Variants

  32. DRB1*0101 Visualization protective risk 67F 67I 70D 70D 71R 71R 28D 28E 26F 26F 30Y 30L 37Y 37F 86V 86G

  33. Publication

  34. Table of subject vs. HLA 4-digit typing data Table of subject vs. SFVT feature vector Table of p-values, adj. p-values, odds ratio, confidence intervals TCR Binding CD8 Binding ImmPort HLA SFVT Workflow

  35. Summary • SFVT Approach • Proposed a novel approach for HLA disease associations based on sequence feature variant type analysis (SFVT) • Defined structural and functional protein sequence features (SF) for all classical human MHC class I and II proteins • Determined variant types (VT) for all SF in known alleles • Available in ImmPort www.immport.org, IMGT-HLA and dbMHC • Systemic Sclerosis Analysis • Based on the SFVT approach, identified a region of the HLA-DRB1 protein centered around peptide-binding pocket 7 that appears to be associated with disease risk • Sequences found in HLA-DRB1*1104 at positions 28, 30, 37, 67 and 86, especially with aromatic amino acids, were associated with increase disease risk • Sequences found in this region of HLA-DRB1*0302 appear to be protective • Different alleles are associated with altered risk in different racial/ethnic populations, but they share common SFVTs • SFVTs associated with risk of developing SSc are different in patients with anti-topo versus anti-cent antibodies, supporting the idea that these are distinct disease • However, the risk-associated SFVTs are from the same SFs suggesting a common mechanism of disease pathogenesis

  36. IRD Overview www.fludb.org

  37. Influenza A Sequence Features as of 18JUL2011 4128 SFs total

  38. SF8 (nuclear export signal)

  39. VT for SF8 (nuclear export signal)

  40. VT-1 strains

  41. VT distribution by host

  42. genetic determinants of influenza species jump events based on convergent evolution

  43. Flu pandemics of the 20th and 21st centuries initiated by species jump events • 1918 flu pandemic (Spanish flu) • subtype H1N1 (avian origin) • estimated to have claimed between 2.5% to 5.0% of the world’s population (20 > 100 million deaths) • Asian flu (1957 – 1958) • subtype H2N2 (avian origin) • 1 - 1.5 million deaths • Hong Kong flu (1968 – 1969) • subtype H3N2 (avian origin) • between 750,000 and 1 million deaths • 2009 H1N1 • subtype H1N1 (swine origin) • ~ 16,000 deaths as of March 2010

  44. Pandemic stages Adaptive drivers

  45. Basic reproductive number (R0) • Total number of secondary cases per case • Reasonable surrogate of fitness • Characteristics of pandemic viruses: • R0H >1, and • In genetic neighborhood of viruses with R0R>1 and R0H<1 • Adaptive drivers A1 A2 • Reservoir virus • (R0R>1 and R0H<<1) • Stuttering viruses • (R0R>1 and R0H<1) • Pandemic Viruses • (R0H >1)

  46. Adaptive drivers Pepin KM et al. (2010) “Identifying genetics markers of adaptation for surveillance of viral host jump” Nature Reviews Microbiology 8: 802-814.

  47. Stuttering transmission and adaptive drivers • Stuttering transmission can reveal adaptive drivers by evidence of convergent evolution • Odds of finding the same neutral mutation by chance in multiple species jumps is low • Therefore, finding same mutation in multiple independent species jump events is strong evidence for adaptive driver

  48. Genetic convergence during species jump • Virus isolate groups from IRD • Avian H5N1 (PB2) from Southeast Asia* up to 2003 (260 records) – reservoirs of source viruses • Human H5N1 (PB2) from Southeast Asia 2003-present (165 records) – many examples of independent species jumps • Align amino acid sequence and calculate conservation score • Identify highly conserved positions in avian records (≤1/260 variants) (557positions/759) – functionally restricted in reservoir • Select subset in which two or more human isolates contained the same sequence variant – either due to human-human transmission or convergent evolution *China, Hong Kong, Indonesia, Thailand, Viet Nam

  49. Strain Search – PB2 avian H5N1 Southeast Asia up to 2003

  50. 260 PB2 records

More Related