admixture mapping
Skip this Video
Download Presentation
Admixture mapping

Loading in 2 Seconds...

play fullscreen
1 / 32

Admixture mapping - PowerPoint PPT Presentation

  • Uploaded on

Admixture mapping. Paul McKeigue Public Health Sciences Section College of Medicine and Veterinary Medicine University of Edinburgh. Applications of statistical modelling of population admixture. Admixture mapping

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Admixture mapping' - haines

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
admixture mapping

Admixture mapping

Paul McKeigue

Public Health Sciences Section

College of Medicine and Veterinary Medicine

University of Edinburgh

applications of statistical modelling of population admixture
Applications of statistical modelling of population admixture
  • Admixture mapping
    • localizes genes in which risk alleles are distributed differentially between ethnic groups
  • Investigating relation of disease risk to individual admixture proportions
    • to distinguish genetic and environmental explanations of ethnic variation in risk
  • Controlling for population stratification in genetic association studies
    • eliminates confounding except by alleles at linked loci
  • Fine mapping of genetic associations in admixed populations
    • to eliminate long-range signals generated by admixture
Distinguishing between genetic and environmental explanations for ethnic differences in disease risk
  • Migrant studies:
    • consistency of high or low risk in varying environments
    • trend of risk ratio with number of generations since migration
    • failure of environmental factors to account for ethnic difference
  • Relation of risk to proportionate admixture
    • may be confounded by environmental factors
Ethnic differences in disease risk that (on the basis of migrant studies) are unlikely to have a genetic basis
  • Japanese-European: breast cancer, colon cancer, coronary heart disease
    • after 1-2 generations risk in Japanese migrants equals risk in US Whites
  • African-European: multiple sclerosis
    • low risk in Europeans who migrated to South Africa before age 12
type 2 diabetes prevalence in south asian migrants and their descendants
Type 2 diabetes: prevalence in South Asian migrants and their descendants

Age Prevalence

First-generation migrants

1991 England 40-64 19%

> 5 generations since migration from India

1977 Trinidad 35-69 21%

1983 Fiji 35-64 25%

1985 South Africa 30- 22%

1990 Singapore 40-69 25%

1990 Mauritius 35-64 20%

Type 2 diabetes: effect of gene flow from European males into a high-risk population (Nauruan islanders)

% with European HLA typesAge Diabetic Non-diabetic

20-44 6% 12%45-59 9% 13%60 + 5% 55%

Odds ratio for diabetes in those with European admixture = 0.31 (95% CI 0.11 - 0.81)

Serjeantson SW. Diabetologia 1983;25:13

relation of risk of systemic lupus erythematosus to individual admixture in trinidad molokhia 2003
Relation of risk of systemic lupus erythematosus to individual admixture in Trinidad (Molokhia 2003)
  • 44 cases and 80 controls resident in northern Trinidad (excluding those with Indian or Chinese ancestry)
  • Admixture proportions of each individual estimated from genotypes at 31 marker loci

Risk ratio (95% CI) for unit change in African admixture

Unadjusted 32.5 2.0 - 518

Adjusted for socioeconomic status 28.4 1.7 - 485

methods for finding genes that influence complex traits
Methods for finding genes that influence complex traits
  • Family linkage studies:
    • localize genes underlying familial aggregation of a trait
    • collections: families with >1 affected member
    • genome search requires typing <1000 markers
    • Low statistical power for genes of modest effect
  • Association studies
    • localize genes underlying trait variation between individuals
    • collections: case-control or cross-sectional
    • genome search with tag SNPs requires > 300 000 markers
    • Tag SNP approach relies on low allelic heterogeneity
exploiting admixture to map genes
Exploiting admixture to map genes
  • Admixture mapping: infer ancestry at marker locus (0, 1 or 2 copies from the high-risk population) then test for association of ancestry with the trait or disease
    • analogous to linkage analysis of an experimental cross
  • Testing for allelic association (Chakraborty & Weiss 1988, Stephens et al. 1994 “MALD”) does not fully exploit the information about linkage that is generated by admixture
    • efficiency of MALD is limited by information content for ancestry of individual markers ( < 40%)
    • cannot use affected-only design
statistical power of admixture mapping
Statistical power of admixture mapping
  • Required sample size is determined by the ancestry risk ratio (r)
    • ~800 cases required to detect a locus with r = 2
    • ~3000 cases required to detect a locus with r = 1.5
    • assuming that:
      • a dense panel of ancestry informative markers is available
      • admixture proportions from the high-risk population are between 20% and 70%
  • Affected-only test of N individuals has same statistical power as case-control test of 2N cases and 2N controls
Advantages of admixture mapping in comparison with other approaches to finding disease susceptibility genes
  • Statistical power
    • admixture mapping relies on direct (fixed-effects) comparison
    • family linkage studies rely on indirect (random-effects) comparison
  • Number of markers required for a genome search
    • ~ 2000 ancestry-informative markers for a genome search, compared with > 300 000 markers for whole-genome association studies
  • Effect of allelic heterogeneity
    • does not matter whether there are many rare risk alleles or only a few common risk alleles at the disease locus
recent admixture between low risk and high risk populations
Recent admixture between low-risk and high-risk populations

Founding Generations populations since admixtureCaribbean, USA W African/European 2 – 15

Australia Native Aus./European 6 - 8

Americas Native Am./European 2 - 15

Pacific islands indigenous/European

Alaska,Canada, Greenland Inuit/European ?10

East Africa Arab/E African ~ 15-20?

an experimental cross between inbred strains

F1 generation

F2 generation

Gene copies from strain







An experimental cross between inbred strains
methodological problems of extending linkage analysis of a cross to admixed human populations
Methodological problems of extending linkage analysis of a cross to admixed human populations
  • History of admixture is not under experimental control or even known
    • population structure generates associations with ancestry at loci unlinked to the trait
  • Ancestral populations are not available for study
    • cannot sample exact mix of west African populations that contributed to the African-American gene pool
  • Human ethnic groups are not inbred strains: FST ~ 0.15
    • markers with 100% frequency differentials are rare
    • cannot unequivocally infer ancestry at locus from marker genotype
model for stochastic variation of ancestry on chromosomes inherited from an admixed parent







Model for stochastic variation of ancestry on chromosomes inherited from an admixed parent

Hidden states: states of ancestry at marker loci on chromosome of mixed descent

Observed data: marker alleles at each locus

Stochastic variation between K states modelled as sum of K independent Poisson arrival processes

Total arrival rate (sum of intensities) can be interpreted as the effective number of generations back to unadmixed ancestors

multipoint inference of ancestry at marker loci from genotypes













Multipoint inference of ancestry at marker loci from genotypes
  • Hidden Markov model (HMM) message-passing algorithm yields posterior marginal distribution of ancestry states at each locus, given genotypes at all loci on the chromosome
  • Information about locus ancestry depends on marker allele frequencies and marker density
null hypothesis as graphical model
Population distribution of admixture in parental generationNull hypothesis as graphical model

i th individual

Maternal gamete admixture [i]

Paternal gamete admixture [i]

covariates [i]

Arrival process intensity parameter

Paternal locus ancestry [i,j]

Maternal locus ancestry [i,j]

trait measurement [i]

haplotype pair[i,j]

genotype [i,j]

j th locus

Subpopulation-specific haplotype frequencies [j]

Regression parameters

statistical approach to model fitting
Statistical approach to model fitting
  • Bayesian model of null hypothesis: all observed and missing data are random variables
    • Observed data: genotypes, trait values, covariates
    • Missing data:-
      • model parameters (admixture proportions, arrival rate)
      • locus ancestry states
  • Posterior distribution of model parameters is generated by Markov chain Monte Carlo (MCMC) simulation
  • For each realization of the model parameters, marginal distribution of locus ancestry is calculated by an HMM algorithm
  • Three programs based on this approach are currently available: ADMIXMAP, ANCESTRYMAP, STRUCTURE
statistical approaches to hypothesis testing
Statistical approaches to hypothesis testing
  • Null hypothesis:  = 0 (where  is the log ancestry risk ratio generated by the locus under study)
  • By averaging over the posterior distribution of missing data under the null, we can evaluate two types of test:-
  • Likelihood ratio test (implemented in ANCESTRYMAP):
    • evaluates L( ) / L(0)
    • averaging over prior on  yields Bayes factor (ratio of integrated likelihoods) for an effect at the locus under study compared with the null
    • averaging over all positions on genome yields Bayes factor for an effect somewhere on the genome compared with the null
  • Score test (implemented in ADMIXMAP):
    • evaluates gradient and second derivative of log L( ) at  = 0 , to obtain a classical p-value
evaluation of score test by averaging over posterior distribution of missing data
Evaluation of score test by averaging over posterior distribution of missing data
  • For each realization of complete data, evaluate:
    • score (gradient of log-likelihood) at  = 0
    • information (curvature of log-likelihood) at  = 0
  • Score U = posterior mean of realized score
    • Complete info = posterior mean of realized info
    • Missing info = posterior variance of realized score
  • Observed info V = complete info – missing info
  • Test statistic = UV-½
advantages of the score test algorithm compared with likelihood ratio
Advantages of the score test algorithm (compared with likelihood ratio)
  • All calculations are at  = 0
    • computationally efficient, no ascertainment problems
  • Meta-analyses are straightforward: just add the score and information across studies
  • Ratio of observed to complete information provides a useful measure of the efficiency of the study design
  • Can be used to calculate model diagnostics:
    • test for departure from Hardy-Weinberg equilibrium
    • test for residual LD between pairs of adjacent marker loci
other model diagnostics bayesian p values
Other model diagnostics: “Bayesian p-values”
  • Can be applied where alternative to fitted model is not simply  < > 0
    • Compare posterior distribution of test statistic Tobs calculated from the realized data with posterior predictive distribution of statistic Trep, calculated by simulating a replicate dataset given model parameters
    • Posterior predictive check probability or “Bayesian p-value” (Rubin) is Prob (Trep > Tobs)
  • Used to test for lack of fit of ancestry-specific allele freqs to prior distributions
information about ancestry conveyed by a diallelic marker
Information about ancestry conveyed by a diallelic marker

Marker allele 1 has ancestry-specific frequencies pX, pY given ancestry from populations X, Y respectively

In an equally-admixed population, the proportion of Fisher information about ancestry of an allele (X or Y by descent) extracted by typing the allele is


40% ancestry information content (f = 0.4) is equivalent to allele frequency differentials of about 0.6

how many markers are required for genome wide admixture mapping
How many markers are required for genome-wide admixture mapping?
  • Simulation studies based on typical populations where admixture dates back genetic structure
    • 80%/20% admixture, sum of intensities 6 per 100 cM, markers with 36% information content for ancestry
    • 64% of information about ancestry is extracted with markers spaced at 3 cM
    • 80% of information about ancestry is extracted with markers spaced at 1 cM
panels of ancestry informative markers
Panels of ancestry-informative markers
  • Assembly of a panel of ~ 3000 ancestry-informative markers (AIMs) requires screening several hundred thousand SNPs for which allele frequency data are available
  • Marker panels are now available for
    • west African / European admixture (Smith 2004, Tian 2006)
    • Native American / European admixture (Mao 2007, Tian 2007, Price 2007)
recent successes with admixture mapping
Recent successes with admixture mapping
  • Detection of regions linked to disease
    • Linkage with multiple sclerosis in African-Americans (Reich 2005)
    • Linkage with prostate cancer on 8q24 in African-Americans (Freedman 2006)
  • Identification of QTLs
    • Detection of a functional SNP in SLC24A5 that accounts for ~25% of European/African difference in skin melanin content (Lamason 2005)
    • Detection of a functional SNP in IL6R that accounts for 33% of variance in interleukin 6 soluble receptor levels (Reich 2007)
do admixture mapping studies require a control group
Do admixture mapping studies require a control group?
  • Affected-only design is the most efficient if model assumptions hold
  • Control group is useful:-
    • as a source of unbiased information on allele frequencies
    • as a sanity check, and specifically to test the assumption of no ancestry state heterogeneity across the genome
    • for subsequent fine mapping
    • Control data from studies of other disease in the same population can be re-used
fine mapping in admixed populations
Fine mapping in admixed populations
  • For fine mapping, we want to be able to condition on locus ancestry so as to eliminate long-range signals generated by admixture
  • Standard model of admixture requires minimal spacing of 0.5 cM to ensure no residual LD between marker loci
    • For inference of locus ancestry (as in admixture mapping), ~3000 ancestry-informative markers are sufficient
  • For fine mapping with ~500 000 tag SNPs, we can model all loci but omit feedback of information about locus ancestry from all but a subset of ~3000 AIMs
other applications of statistical modelling of admixture
Other applications of statistical modelling of admixture
  • Admixture mapping in outbred animal populations
    • livestock, heterogeneous stocks of mice
  • Inferring the genetic background of an individual
    • forensic applications, restricting samples by genetic background, classification of domestic animals and livestock