On genome-wide association studies (GWAS). association linkage disequilibrium population structure. case/control design single nucleotide polymorphism data. TTCAGTCAGATCC T AGCCC. Chromosome 1. TTCAGTCAGATCC C AGCCC. Chromosome 2. AAGTCAGTCTAGG G TCGGG. SNP. AAGTCAGTCTAGG A TCGGG.
Population structure explained part of the significant +11.2% inflation of test statistics we observed in an analysis of 6,322 nonsynonymous SNPs in 816 cases of type 1 diabetes and 877 population-based controls from Great Britain. The remainder of the inflation resulted from differential bias in genotype scoring between case and control DNA samples, which originated from two laboratories, causing false-positive associations.
Nature Genetics37, 1243 - 1246 (2005)
Published online: 9 October 2005; | doi:10.1038/ng1653
Population structure, differential bias and genomic control in a large-scale, case-control association study
David G Clayton1, Neil M Walker1, Deborah J Smyth1, Rebecca Pask1, Jason D Cooper1, Lisa M Maier1, Luc J Smink1, Alex C Lam1, Nigel R Ovington1, Helen E Stevens1, Sarah Nutland1, Joanna M M Howson1, Malek Faham2, Martin Moorhead2, Hywel B Jones2, Matthew Falkowski2, Paul Hardenbol2, Thomas D Willis2 & John A Todd1
Nature447, 661-678 (7 June 2007) | doi:10.1038/nature05911; Received 26 March 2007; Accepted 11 May 2007
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
The Wellcome Trust Case Control Consortium
There may be important population structure that is not well captured by current geographical region of residence. Present implementations of strongly model-based approaches such as STRUCTURE11, 12 are impracticable for data sets of this size, and we reverted to the classical method of principal components13, 14, using a subset of 197,175 SNPs chosen to reduce inter-locus linkage disequilibrium. Nevertheless, four of the first six principal components clearly picked up effects attributable to local linkage disequilibrium rather than genome-wide structure. The remaining two components show the same predominant geographical trend from NW to SE but, perhaps unsurprisingly, London is set somewhat apart
The overall effect of population structure on our association results seems to be small, once recent migrants from outside Europe are excluded. Estimates of over-dispersion of the association trend test statistics (usually denoted ; ref. 15) ranged from 1.03 and 1.05 for RA and T1D, respectively, to 1.08–1.11 for the remaining diseases. Some of this over-dispersion could be due to factors other than structure, and this possibility is supported by the fact that inclusion of the two ancestry informative principal components as covariates in the association tests reduced the over-dispersion estimates only slightly (Supplementary Table 6), as did stratification by geographical region. This impression is confirmed on noting that P values with and without correction for structure are similar (Supplementary Fig. 9). We conclude that, for most of the genome, population structure has at most a small confounding effect in our study, and as a consequence the analyses reported below do not correct for structure. In principle, apparent associations in the few genomic regions identified in Table 1 as showing strong geographical differentiation should be interpreted with caution, but none arose in our analyses.