1 / 33

Diabetes Genome Wide Association

Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health. Diabetes Genome Wide Association. Background - Type 2 Diabetes Mellitus. Disorder characterized by impaired glucose/insulin function >170 million worldwide.

Download Presentation

Diabetes Genome Wide Association

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health Diabetes Genome Wide Association

  2. Background - Type 2 Diabetes Mellitus • Disorder characterized by impaired glucose/insulin function • >170 million worldwide

  3. Background - Genetic Justification • Explosion of diabetes plus rapidly decreasing age of onset argues for environmental rather than genetic etiology • Genetic justification • Clustering in families • Leveling off of risk by BMI • Mouse data • Pattern argues for a polygenic trait - GWAS!

  4. Methods • 3 separate studies, all working collaboratively • Different populations, different analyses

  5. FUSION/Finrisk Population Genome-wide scan study ( N=2,335 Finland) Population-based 1161 1174 (T2D) (Controls) Matched on age, sex, birth province

  6. DGI Population Genome-wide scan study ( N=2,931/ Finland /Sweden) Population-based Family- based 1022 1075 422 392 (T2D) (Controls) (T2D) (Controls) Matched on gender, age, BMI, Discordant siblings matched on age place of origin

  7. WTCCC Population Genome-wide scan study ( N=4,862 British/Irish) 1924 2938 (T2D) (Controls) No matching From a diabetes “repository” From a 1958 birth cohort

  8. Methods – General Outline • All three studies start with study populations between 2335 and 4862 • All three run genome-wide association scans initially analyzing 300-400,000 SNPs, and reduce that number with certain criteria • All three studies then run second waves of replication or conduct replication studies in independent populations • Findings are compared with previously published reports and across the three studies • Weighted meta-analysis • Findings are fairly consistent between the three study populations, with many replicated associations

  9. Population Stratification • All three studies investigated potential population stratification by • Cochran-Armitage tests • Genome control inflation factor (λ) • Principal components analysis using EIGENSTRAT • Adjustment for region/birthplace • Matching, choice of study population • Replication in independent datasets

  10. Methods - Platform • Genotyping Platform for GWAS • Affymetrix GeneChip Human Mapping 500k Array Set • Wellcome Trust Case Control Consortium (WTCCC)UK • Diabetes Genetics Initiative (DGI) • Both population- (matched on gender, age, BMI and region of origin) and family-based samples • Illumina HumanHap300 BeadChip • Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics (FUSION) • 1161 Finnish T2D cases and 1174 normal glucose-tolerant controls from FUSION and Finrisk 2002 studies (matched by province, sex and age)

  11. Methods - FUSION • FUSION analyzed 315,635 SNPs with MAF > 0.002 with a model that is additive on the log-odds scale • They observed an excess of low p-values (P<10-4), suggesting many common variants with modest effects (λ = 1.026) • Imputed >2 million SNPs using data from HapMap CEU to cover 89.1% of SNPs with MAF >1% • Compared stage 1 results with DGI and WTCCC to increase statistical power and select SNPs for stage 2 • An association was “genome-wide significant” if p<5x10-8 • Stage 2 replication sample of 1215 Finnish T2D cases and 1258 Finnish NGT controls • 80 of 82 selected SNPs genotyped

  12. Methods - FUSION • Stage 2 analysis selected SNPs based on • FUSION genotyped and imputed SNPs from stage 1, using a prioritization algorithm that gave preference to genotyped SNPs • Combined analysis of GWA results from FUSION, DGI, and WTCCC • Previous T2D association results • Joint analysis of Stage 1 + Stage 2 • All-data meta-analysis of FUSION, DGI, WTCCC and follow-up samples

  13. Methods - DGI • DGI analyzed 386,731 SNPs after applying strict quality control filters, developed 284,968 additional two-marker (haplotype) tests, for a total of 671,699 tests • Each SNP and haplotype was tested for association with T2D and each of 18 clinical traits • Population and family-based samples combined with a weighted meta-analysis • Quantitative traits assessed by linear or logistic regression • “Genome-wide significant” associations at p<5x10-8 • Three strategies to search for systemic bias • P-value distribution in population sample (λ = 1.05), principal components analysis, and independent genotyping of 114 SNPs with extreme p-values

  14. Methods - DGI • Observed an excess of low p-values • 1000 permuted whole-genome analyses with phenotype data randomized within matched case-control groups to evaluate the significance of excess of low p –values • Suggests many variants with modest effects, not few variants with large effects • Replication in independent sample of 10,850 subjects from case-control samples of European ancestry (Sweden, USA, Poland) under the same model • Replication set of 107 SNPs selected on the basis of this study and comparisons with WTCCC and FUSION

  15. Methods - WTCCC • Analyzed 393,453 autosomal SNPs with minor allele frequencies >1% in both cases and controls and no extreme departure from HWE (P<10-4) • Additional quality controls to find true associations included cluster-plot visualization, and validation genotyping on a second platform • P-value distribution indicates no substantial confounding by population substructure or genotyping bias (λ = 1.08) • The WTCCC group used 3 replication sets with an additional 3757 cases and 5346 controls from two other UK studies

  16. Methods - WTCCC • First wave of SNPs selected 21 representative SNPs from the 30 SNPs in 9 distinct chromosomal regions with the most extreme p-values from the initial scan (p<10-5) to limit false discovery • Second wave relaxed p-value to detect modest associations (p~10-2 to 10-5) and found 5367 SNPs • Prioritized SNPs by evidence of association in DGI and FUSION; presence of multiple, independent associations within the same locus; and biological candidacy to analyze 56 SNPs

  17. Results- FUSION GWAS

  18. Results- FUSION GWAS Common in all 3 studies Common in 2 studies

  19. Results-FUSION GWAS • 10 loci identified: • 5 new: near genes IGF2BP2, CDKAL1, CDKN2A/2B, intergenic region ch. 11, FTO • 5 previously published: near PPARG, SLC30A8, HHEX, TCF7L2, KCNJ11 • All loci have biological plausibility. Unknown for non-coding region ch. 11 • FUSION study found: • Strong evidence for • TCF7L2 (stage 1+2) • SLC30A8 (stage 1) • IGF2BP2 (stage 1) • Intergenic region ch. 11 (stage 1) • Modest evidence for • HHEX • CDKAL1 (stage 1) • CDKN2A/2B (stage 1+2) • FTO (stage 1+2) • Some evidence for • PPARG (Imputed) • KCNJ11 (Imputed)

  20. Results-FUSION GWAS • Compared results to DGI and WTCCC scans • HHEX, CDKAL1, FTO with modest evidence showed stronger evidence in WTCCC scan • SLC30A8 subsequent genotyping in other studies resulted in stronger evidence in combined sample • All SNPs or genes in this study overlap with corresponding SNP/gene in at least one of the other studies except the intergenic region on ch. 11 • Intergenic region on ch. 11 • Includes 3 sets spliced Expressed Sequence Tags • Nearby regions reported in other GWA study (Sladek 2007)

  21. Meta-Analysis • All-data meta-analysis of FUSION, DGI, WTCCC and follow-up samples • Weighted log ORs from each study by the inverse of the variance • Total sample size: 32,544 (increased 7-fold from FUSION alone) • Increased sample size, power to detect modest effects • All 10 loci reached genome-wide significance in meta-analysis (helping to confirm loci with only some evidence, emphasizing importance of combining data)

  22. DGI GWA

  23. Results- DGI GWAS Common in 2 studies Common in all 3 studies Confirmed T2D susceptibility variants

  24. DGI GWAS . TD2 was trait associated with novel and previous published candidate genes . Association with HHEX was confirmed in this GWA, WTCC/UKT2D and by other studies (Sladek 2007) . Association with SLC30A8 was consistently confirmed by WTCC/UKT2D and FUSION . No evidence for association: LOC387761, EXT2-ALX4 . Additional loci: FLJ393370, PKN2

  25. DGI GWAS .Current WGA and collaborators: evidence for association was verified in 3 previously unknown loci with T2D risk ( CDKN2B, IGF2BP2 and CDKAL1) . 15 common variants for T2D and lipid levels were identified . New T2D genes suggest a primary role of the pancreatic beta cell

  26. Results- WTCCC GWA

  27. Results- WTCCC GWA Common in 2 studies Common in all three studies Confirmed T2D susceptibility variants

  28. Results- WTCCC GWA • In the WTCCC, the strongest association signals were found for SNPs in TCF7L2 (P=6.7x10-13) • From the first wave of SNPs, replication was found for SNPs in CDKAL1: • ‘Compelling’ evidence across all studies (P~4.1 x 10-11), SNPs map to a 90kb intron, may be involved in regulation of pancreatic beta cell function • An association at FTO on chromosome 16 (rs8050136) was found to be mediated through a primary effect on adiposity • Confirmed a previously reported association at HHEX • The HHEX signal is in an area of LD also containing genes encoding KIF11 and IDE, which have biological plausibility

  29. Results- WTCCC GWA • The second wave found modest associations with SNPs in CDKN2A/CDKN2B replicated across the studies: • CDKN2A is a known tumor suppressor, and produces p16INK4a which inhibits CDK4, a regulator of pancreatic beta cell replication • SNPs from the promoter and first 2 exons of IGF2BP2 were replicated in WTCCC, DGI, and FUSION • Combined evidence was strong (P~8.6x10-16), biological plausibility • Independent genotyping of SLC30A8 (rs13266634) replicated previously reported findings (P=7.0x10-5 in all UK data) • Affymetrix chip does not capture this locus

  30. Results- WTCCC GWA • This study identified several T2D susceptibility loci • Confirmed previously reported loci including • TCF7L2: the largest association signal • FTO: the effect disappeared after adjustment for BMI • HHEX/IDE: Strong replication, biological plausibility • Three novel loci • CDKAL1, IGF2BP2, and CDKN2A:replicated across the 3 studies in this analysis

  31. Conclusions - Differences Across Studies • Study populations • Location • Family-based vs. unrelated • Matching factors • Definition of diabetes • Genotyping platforms • Illumina vs. Affymetrix • Analysis plans • Individual tests • Haplotype analysis • Imputation methods • P-value criteria

  32. Conclusions - Theoretical Considerations • Agnostic/statistical vs. prior information/biological plausibility • Relaxed vs. strict criteria • Ability to replicate

  33. Conclusions - Future Directions • Non-coding regions may be important • Many more variants yet to be determined - larger studies needed • Resequencing and functional studies are necessary to determine causal variants • Generalizability concerns • Collaborative model will benefit science!

More Related