Diabetes Genome Wide Association

Alessandra C Goulart Ida Hatoum Stalo Karageorgi Mara Meyer EPI293 January 2008 Harvard School of Public Health Diabetes Genome Wide Association

Background - Type 2 Diabetes Mellitus • Disorder characterized by impaired glucose/insulin function • >170 million worldwide

Background - Genetic Justification • Explosion of diabetes plus rapidly decreasing age of onset argues for environmental rather than genetic etiology • Genetic justification • Clustering in families • Leveling off of risk by BMI • Mouse data • Pattern argues for a polygenic trait - GWAS!

Methods • 3 separate studies, all working collaboratively • Different populations, different analyses

FUSION/Finrisk Population Genome-wide scan study ( N=2,335 Finland) Population-based 1161 1174 (T2D) (Controls) Matched on age, sex, birth province

DGI Population Genome-wide scan study ( N=2,931/ Finland /Sweden) Population-based Family- based 1022 1075 422 392 (T2D) (Controls) (T2D) (Controls) Matched on gender, age, BMI, Discordant siblings matched on age place of origin

WTCCC Population Genome-wide scan study ( N=4,862 British/Irish) 1924 2938 (T2D) (Controls) No matching From a diabetes “repository” From a 1958 birth cohort

Methods – General Outline • All three studies start with study populations between 2335 and 4862 • All three run genome-wide association scans initially analyzing 300-400,000 SNPs, and reduce that number with certain criteria • All three studies then run second waves of replication or conduct replication studies in independent populations • Findings are compared with previously published reports and across the three studies • Weighted meta-analysis • Findings are fairly consistent between the three study populations, with many replicated associations

Population Stratification • All three studies investigated potential population stratification by • Cochran-Armitage tests • Genome control inflation factor (λ) • Principal components analysis using EIGENSTRAT • Adjustment for region/birthplace • Matching, choice of study population • Replication in independent datasets

Methods - Platform • Genotyping Platform for GWAS • Affymetrix GeneChip Human Mapping 500k Array Set • Wellcome Trust Case Control Consortium (WTCCC)UK • Diabetes Genetics Initiative (DGI) • Both population- (matched on gender, age, BMI and region of origin) and family-based samples • Illumina HumanHap300 BeadChip • Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics (FUSION) • 1161 Finnish T2D cases and 1174 normal glucose-tolerant controls from FUSION and Finrisk 2002 studies (matched by province, sex and age)

Methods - FUSION • FUSION analyzed 315,635 SNPs with MAF > 0.002 with a model that is additive on the log-odds scale • They observed an excess of low p-values (P<10-4), suggesting many common variants with modest effects (λ = 1.026) • Imputed >2 million SNPs using data from HapMap CEU to cover 89.1% of SNPs with MAF >1% • Compared stage 1 results with DGI and WTCCC to increase statistical power and select SNPs for stage 2 • An association was “genome-wide significant” if p<5x10-8 • Stage 2 replication sample of 1215 Finnish T2D cases and 1258 Finnish NGT controls • 80 of 82 selected SNPs genotyped

Methods - FUSION • Stage 2 analysis selected SNPs based on • FUSION genotyped and imputed SNPs from stage 1, using a prioritization algorithm that gave preference to genotyped SNPs • Combined analysis of GWA results from FUSION, DGI, and WTCCC • Previous T2D association results • Joint analysis of Stage 1 + Stage 2 • All-data meta-analysis of FUSION, DGI, WTCCC and follow-up samples

Methods - DGI • DGI analyzed 386,731 SNPs after applying strict quality control filters, developed 284,968 additional two-marker (haplotype) tests, for a total of 671,699 tests • Each SNP and haplotype was tested for association with T2D and each of 18 clinical traits • Population and family-based samples combined with a weighted meta-analysis • Quantitative traits assessed by linear or logistic regression • “Genome-wide significant” associations at p<5x10-8 • Three strategies to search for systemic bias • P-value distribution in population sample (λ = 1.05), principal components analysis, and independent genotyping of 114 SNPs with extreme p-values

Methods - DGI • Observed an excess of low p-values • 1000 permuted whole-genome analyses with phenotype data randomized within matched case-control groups to evaluate the significance of excess of low p –values • Suggests many variants with modest effects, not few variants with large effects • Replication in independent sample of 10,850 subjects from case-control samples of European ancestry (Sweden, USA, Poland) under the same model • Replication set of 107 SNPs selected on the basis of this study and comparisons with WTCCC and FUSION

Methods - WTCCC • Analyzed 393,453 autosomal SNPs with minor allele frequencies >1% in both cases and controls and no extreme departure from HWE (P<10-4) • Additional quality controls to find true associations included cluster-plot visualization, and validation genotyping on a second platform • P-value distribution indicates no substantial confounding by population substructure or genotyping bias (λ = 1.08) • The WTCCC group used 3 replication sets with an additional 3757 cases and 5346 controls from two other UK studies

Methods - WTCCC • First wave of SNPs selected 21 representative SNPs from the 30 SNPs in 9 distinct chromosomal regions with the most extreme p-values from the initial scan (p<10-5) to limit false discovery • Second wave relaxed p-value to detect modest associations (p~10-2 to 10-5) and found 5367 SNPs • Prioritized SNPs by evidence of association in DGI and FUSION; presence of multiple, independent associations within the same locus; and biological candidacy to analyze 56 SNPs

Results- FUSION GWAS

Results- FUSION GWAS Common in all 3 studies Common in 2 studies

Results-FUSION GWAS • 10 loci identified: • 5 new: near genes IGF2BP2, CDKAL1, CDKN2A/2B, intergenic region ch. 11, FTO • 5 previously published: near PPARG, SLC30A8, HHEX, TCF7L2, KCNJ11 • All loci have biological plausibility. Unknown for non-coding region ch. 11 • FUSION study found: • Strong evidence for • TCF7L2 (stage 1+2) • SLC30A8 (stage 1) • IGF2BP2 (stage 1) • Intergenic region ch. 11 (stage 1) • Modest evidence for • HHEX • CDKAL1 (stage 1) • CDKN2A/2B (stage 1+2) • FTO (stage 1+2) • Some evidence for • PPARG (Imputed) • KCNJ11 (Imputed)

Results-FUSION GWAS • Compared results to DGI and WTCCC scans • HHEX, CDKAL1, FTO with modest evidence showed stronger evidence in WTCCC scan • SLC30A8 subsequent genotyping in other studies resulted in stronger evidence in combined sample • All SNPs or genes in this study overlap with corresponding SNP/gene in at least one of the other studies except the intergenic region on ch. 11 • Intergenic region on ch. 11 • Includes 3 sets spliced Expressed Sequence Tags • Nearby regions reported in other GWA study (Sladek 2007)

Meta-Analysis • All-data meta-analysis of FUSION, DGI, WTCCC and follow-up samples • Weighted log ORs from each study by the inverse of the variance • Total sample size: 32,544 (increased 7-fold from FUSION alone) • Increased sample size, power to detect modest effects • All 10 loci reached genome-wide significance in meta-analysis (helping to confirm loci with only some evidence, emphasizing importance of combining data)

DGI GWA

Results- DGI GWAS Common in 2 studies Common in all 3 studies Confirmed T2D susceptibility variants

DGI GWAS . TD2 was trait associated with novel and previous published candidate genes . Association with HHEX was confirmed in this GWA, WTCC/UKT2D and by other studies (Sladek 2007) . Association with SLC30A8 was consistently confirmed by WTCC/UKT2D and FUSION . No evidence for association: LOC387761, EXT2-ALX4 . Additional loci: FLJ393370, PKN2

DGI GWAS .Current WGA and collaborators: evidence for association was verified in 3 previously unknown loci with T2D risk ( CDKN2B, IGF2BP2 and CDKAL1) . 15 common variants for T2D and lipid levels were identified . New T2D genes suggest a primary role of the pancreatic beta cell

Results- WTCCC GWA

Results- WTCCC GWA Common in 2 studies Common in all three studies Confirmed T2D susceptibility variants

Results- WTCCC GWA • In the WTCCC, the strongest association signals were found for SNPs in TCF7L2 (P=6.7x10-13) • From the first wave of SNPs, replication was found for SNPs in CDKAL1: • ‘Compelling’ evidence across all studies (P~4.1 x 10-11), SNPs map to a 90kb intron, may be involved in regulation of pancreatic beta cell function • An association at FTO on chromosome 16 (rs8050136) was found to be mediated through a primary effect on adiposity • Confirmed a previously reported association at HHEX • The HHEX signal is in an area of LD also containing genes encoding KIF11 and IDE, which have biological plausibility

Results- WTCCC GWA • The second wave found modest associations with SNPs in CDKN2A/CDKN2B replicated across the studies: • CDKN2A is a known tumor suppressor, and produces p16INK4a which inhibits CDK4, a regulator of pancreatic beta cell replication • SNPs from the promoter and first 2 exons of IGF2BP2 were replicated in WTCCC, DGI, and FUSION • Combined evidence was strong (P~8.6x10-16), biological plausibility • Independent genotyping of SLC30A8 (rs13266634) replicated previously reported findings (P=7.0x10-5 in all UK data) • Affymetrix chip does not capture this locus

Results- WTCCC GWA • This study identified several T2D susceptibility loci • Confirmed previously reported loci including • TCF7L2: the largest association signal • FTO: the effect disappeared after adjustment for BMI • HHEX/IDE: Strong replication, biological plausibility • Three novel loci • CDKAL1, IGF2BP2, and CDKN2A:replicated across the 3 studies in this analysis

Conclusions - Differences Across Studies • Study populations • Location • Family-based vs. unrelated • Matching factors • Definition of diabetes • Genotyping platforms • Illumina vs. Affymetrix • Analysis plans • Individual tests • Haplotype analysis • Imputation methods • P-value criteria

Conclusions - Theoretical Considerations • Agnostic/statistical vs. prior information/biological plausibility • Relaxed vs. strict criteria • Ability to replicate

Conclusions - Future Directions • Non-coding regions may be important • Many more variants yet to be determined - larger studies needed • Resequencing and functional studies are necessary to determine causal variants • Generalizability concerns • Collaborative model will benefit science!

Diabetes Genome Wide Association

Diabetes Genome Wide Association

Presentation Transcript

Genome-wide Association Studies

Design and Analysis of Genome-Wide Association Studies

Epistasis Genome-wide association interaction analysis (GWAI)

Genome-wide association studies for microbial genomes

Genome-wide Association S tudy

Genome-Wide Association Studies and Clinical Applications

Genome-Wide Association Study (GWAS)

Genome-Wide Association Study

Genome-wide association studies

Genome-wide Associations

Genome-Wide Association Studies

Genome-wide association studies

Genome-wide association studies (GWAS)

On genome-wide association studies (GWAS)

Genome-wide association studies

Genome-Wide Association (GWA) Studies

Genome-wide association studies (GWAS)

Genome-wide Studies: Association

Genome-Wide Association Studies

Genome-wide association studies (GWAS)

R Packages for Genome-Wide Association Studies

Genome-wide Association