The utility of the HapMap reference samples for clinical populations. (the informatics of sequence variations and haplotypes). VanBug Seminar Vancouver, BC, Canada September 9, 2004. Gabor T. Marth. Department of Biology, Boston College [email protected] cause inherited diseases.
(the informatics of sequence variations and haplotypes)
Vancouver, BC, Canada
September 9, 2004
Gabor T. Marth
Department of Biology, Boston College
cause inherited diseases populations
allow tracking ancestral human history
Why do we care about variations?
underlie phenotypic differences
Marth et al.
Nature Genetics 1999
genome reference populations
~ 8 million
Sachidanandam et al.
Nature 2001Large SNP mining projects
genome-wide, dense SNP marker map
D=f( ) – f( ) x f( ) populationsLinkage disequilibrium
strong association: most chromosomes carry one of a few common haplotypes – reduced haplotype diversityHaplotype diversity
2n possible haplotypes
random assortment of alleles at different sites
bottleneckThe determinants of allelic association
Daly et al.
Nature Genetics 2001
Gibbs et al.
Nature 2003The promise for medical genetics
A variation structure,
AInformational reagents: haplotypes
Excoffier & Slatkin
Mol Biol Evol 1995
Stephens et al.
AJHG 2001Computational haplotype inference
Mol Biol Evol 1990
Wall & Pritchard
Nature Rev Gen 2003
Zhang et al.
1. meet block definition based on common haplotype requirements
2. within each block, determine the number of SNPs that distinguishes common haplotypes (htSNPs)
3. minimize the total number of htSNPs over complete region including all blocksAnnotations – haplotype blocks
2. allele frequency spectrum (AFS): distribution of SNPs according to allele frequency in a set of samples
“rare”Data: polymorphism distributions
1. marker density (MD): distribution of number of SNPs in pairs of sequences
computable formulations according to allele frequency in a set of samples
3/5Model: processes that generate SNPs
Marth et al.
model consensus: according to allele frequency in a set of samplesbottleneck
presentData fitting: allele frequency
bottleneck according to allele frequency in a set of samples
modest but uninterrupted expansionPopulation specific demographic history
Marth et al.
genealogy + mutations according to allele frequency in a set of samples
arbitrary number of additional replicatesModel-based prediction
computational model encapsulating what we know about the process
average age of polymorphismPrediction – allele frequency and age
“migration”Modeling joint allele structure
SNPs private to European samples classes
SNPs common in both populations
SNPs private to African samplesJoint allele frequencies
observation in UW PGA data
1. reference samples classes
2. common markers
4. list of haplotypes
5. frequent haplotypes
reference haplotypes classes
different populationReference haplotypes
(74.9%, 65.0% at lower minimum marker allele frequency)
Aravinda Chakravarti (Hopkins)
Andy Clark (Cornell)
Pui-Yan Kwok (UCSF)
Henry Harpending (Utah)
Jim Weber (Marshfield)