Population Genomics Friend or Foe? Tim Shank 4/2/03 [email protected] Woods Hole Oceanographic Institution. Genome Projects. Microbial Genomics. Genome- Genome Interactions. Comparative Genomics. Population Genomics. Genomics Projects. Microbial Genomics. Functional Genomics.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
The study of forces that determine patterns of DNA
variations in populations (Michel Veuille, European Consortium)
Field of genomics that links complex genotypes and phenotypes
by comparing the flow of genotypic and phenotypic information
in breeding and natural populations (Andrew Benson, U. Neb)
Genomic variation within species permitting the construction of
detailed linkage maps using polymorphic markers, and through
crossing experiments between individuals with different
phenotypes, identification of genes responsible for phenotypic
variation (e.g, disease susceptibility, drug toxicity) (Andrew Clark, PSU)
Wright (1951) [The genetical structure of populations. Ann. Eugen. 15:323-354.] noted the following relationship holds when populations reach an equilibrium between genetic drift
where N is the variance effective population size of the
average population, and
m is the average proportion of immigrants in each
Problem: Useful parameter space is for FST values
between 0.1 and 0.4
Nm is a virtual number
The giant tubeworm, Riftia pachyptila
East Pacific Rise
Fst. Migration rate
Reject expectations of "island model"
Consistent with stepping-stone model
Inference: a species with more limited dispersal abilities
Black et al. 1994 Gene flow among vestimentiferan tube worm
(Riftia pachyptila) populations from hydrothermal vents of the Eastern Pacific. Marine Biology 120: 33-39.
= arbitrary primer (e.g. ggcattactc)
Amplify regions between priming sites by polymerase chain reaction
Analyze PCR products by agarose gel electrophoresis.
Marker is dominant (presence/absence of band).
No prior sequence knowledge required
Many variations on the theme (e.g., RAMP, ISSR)
1. Amplify Target Sequence
2. Denature product with heat and formamide
3. Analyze on native (nondenaturing) polyacrylamide gel
4. Base sequence determines 3-dimensional conformation
1. Amplify Target Sequence
4. Denaturing gradient gels can be difficult to produce: use perpendicular gradient to identify optimal conditions, move to CDGE: constant denaturant gel electrophoresis
2. Run product on gel with denaturing gradient (parallel or perpendicular to direction gel runs)
3. Product begins denaturing at a certain point, depending on base sequence: greatly retards migration and allows discrimination of alleles based on small sequence differences
1. Amplify Target Sequence
2. Cut with a restriction enzyme that differentiates alleles
3. Alleles can be differentiated by size based on loss or gain of restriction site; May be able to analyze on agarose gel
“…reiterated short sequences [of DNA] tandemly arrayed, with variations in copy number accounting for a profusion of distinguishable alleles” - (Avise 1994)
- Nuclear DNA
Avise, J.C. 1994. Molecular Markers, Natural History and Evolution. Chapman and Hall, New York. 511 pp.
Balloux, F. and N. Lugon-Moulin. 2002. The estimate of population differentiation with microsatellite markers. Molecular Ecology. 11: 155-165.
Goldstein, D.B. and C. Schloterrer (Editors). 1999. Microsatellites: Evolution and Applications. Oxford University Press, Oxford, 352 pp.
Jarne, P and P.J.L. Lagoda. 1996. Microsatellites, from molecules to populations and back. Trends in Ecology and Evolution 11(10): 424-429.
Slatkin, M. 1995. A measure of population subdivision based on microsatellite allele frequencies. Genetics 139: 457-462.
* Depends on cost of restriction enzymes employed
violations of assumptions- linkage equilibrium, mendelian inheritance, neutrality.
Linkage Disequilibrium- alleles at different loci are found together more or less
often than expected based on their frequencies (and location in the genome).
Goldstein and Weale 2001 Population genomics: linkage disequilibrium holds the key. Current Biology 11:576-579
*New York Times, Nov. 2002
originated in Africa 100 - 200k years ago, and spread
Neighbor-joining phylogram based on complete
mtDNA genome sequences (excluding D-loop).
1000 bootstrap replicates shown on nodes.
Asterisk refers to the MRCA of the youngest clade
containing both African and non-African individuals.
or longer genetic history in Africa; bottleneck in non-Affican
Divergence of Africans and non-Africans occurred
52,000 28,000 years ago
mtDNA mismatch distributions for Africans and non-Africans
• Individuals of African origin show a ragged distribution
consistent with constant population size
• Individuals of non-African origin show a bell-shaped distribution
strongly suggests a recent population expansion
Mismatch distributions of pairwise nucleotide
differences between a) African and b) non-African
as uniform resource for describing nucleotide diversity and regional variation
within and between human populations
Many and numerous!
(Reviewed Pui-Yan Kwok Annu. Rev. Genomics Hum Genet. 2001. 2:235-258
SNP discovery can be based on expressed sequence tags (ESTs), genomic restriction fragments,
aligned BAC sequences, random shot gun clone sequences, overlapping genomic clone sequences
One year ago: dbSNP had 2,842,021 SNP submissions total
Today, 2003, dbSNP has 6,250,820 submissions for human
1,368,805 submissions for mosquito
197,414 submissions for mouse
2,031 submissions for zebrafish
It is possible to search dbSNP by BLAST comparisons to a target sequence
The SNP Consortium is an alliance of pharmaceutical and computer companies managed by Lincoln Stein at Cold Spring Harbor Lab.
We describe a map of 1.42 million single nucleotide polymorphisms (SNPs) distributed throughout the human genome, providing an average density on available sequence of one SNP every 1.9 kilobases. These SNPs were primarily discovered by two projects: The SNP Consortium and the analysis of clone overlaps by the International Human Genome Sequencing Consortium. The map integrates all publicly available SNPs with described genes and other genomic features. We estimate that 60,000 SNPs fall within exon (coding and untranslated regions), and 85% of exons are within 5 kb of the nearest SNP. Nucleotide diversity varies greatly across the genome, in a manner broadly consistent with a standard population genetic model of human history. This high-density SNP map provides a public resource for defining haplotype variation across the genome, and should help to identify biomedically important genes for diagnosis and therapy.
competing population genetic models
No demographic history
Poisson distribution driven by mutation rate
Distribution of polymorphic sites profoundly impacted
Increased pop size yields abundance of new lineages with more mutation
Decreased pop size raises likelihood of relatedness resulting in
over-representation of sequence identity
Collapse followed by a phase of recent population recovery
Evaluated degree of fit between observed density distribution and
probability predicted using the log likelihood of the data for a given model
r indicates the per nucleotide, per generation recombination rate
Superior fit of the modeled parameters (with or without recombination) suggests a
severe, 2- to 7 fold, collapse of population size 40,000 years (1600 generations) ago
….followed by a modest recovery
% of successful trials for each model, at each data fraction;
Assessments based on the amount of data required for rejection by X2 test.
Interestingly, data fit between observations and best-fitting models decays with more data.
organisms that exemplify phenomena not well studied in human/worm/mouse?
5 months ago
2102 SNPs for mutation mapping
Hundreds of SNPs on single array
Stickney et al. 2002 Rapid mapping of zebrafish mutations
with SNPs and oligonucleotide microarrays. Genome Res.
Vertical lines = 25 linkage groups
Red dots correspond to SNPs represented on the olig. microarray
Postlehwait et al. 1994 A genetic linkage map for zebrafish. Science 264: 699-703.
Woods et al. 2000 A comparative map of zebrafish genome. Genome Research 10: 1903-1914.
Geisler et al. 1999 A radiation hybrid map of the zebrafish genome. Nature Genetics 23: 86-89.
Shimoda et al. 1999 Zebrafish genetic map with 2000 microsatellite markers. Genomics 58: 219-232.
*New York Times, Nov. 2002
(Glaubitz, Rhodes, and Dewoody 2003 Molecular Ecology 12: 1039-1047)
Need to determine genetic relationships in populations without known pedigrees
To assess known pairwise relationships - via single nucleotide polymorphisms
where already have parallel microsatellite results.
Recent advances in microarray technology permit genotyping of large #s of individuals
at 100s to 1000s of SNP loci (reviewed by Kwok 2001)- this could be big!
Need to know if SNPs equal or exceed the power of practical numbers
of microsatellite loci in estimating relationships?
Microsatellites current methods of choice among close kin within a population,
but the number of independently segregating microsatellite markers is limited
SNPs may provide large number of segregating loci with
a large number of alleles at even frequencies
Glaubitz et al. 2003-
Computer simulations designed to evaluate SNPs ability
to discriminate a variety of (pairwise) relationships likely
to occur in natural populations, comparisons to
microsatellites from Blouin et al 1996
•SNPs segregate independently, ideal genome with 20
autosomes, 5 SNPs per chromosome, 10,000 individuals
Constructed an array of pedigrees
estimated pairwise relatedness at a single locus (r1)
Evaluated the performance of 100 simulated SNPs by
estimating misclassification (rate) of relationships
illustrates that different pairwise relationships can have different amounts of inherent variance in relatedness
100 independently segregating SNPs determinined parent-offspring pairs
as well as about 16 or fewer microsatellite loci when both parents are unknown
Even under the optimistic scenario of 100 independent loci, results show little promise for discriminating higher order relationships on the basis of pairwise relatedness.
Microsatellite approaches are still better…
Based on 1) assumption of independence among the sampled SNP loci
2) that the microsatellites themselves are independent (not linked)
“SNPs have limited potential for the delineation of genealogical relationships…”
To take full advantage of the “vast” abundance of SNPs in metazoan genomes
and their potential automation, we will need analytical methods that account
for tight genetic linkage (McPeek and Sun 2000) and known recombination
until then, SNP population genomics will likely only be used on model organisms.
*New York Times, Nov. 2002
associated with aberrant drug disposition or effects
More discoveries thru polymorphisms
in candidate genes (metabolism; transport;
targets of candidate medication
Observed phenotype - family studies-
Number of Drug Targets
Cumulative Number of Targets Known Today
New Targets Expected from Human Genome Project
Source: Drews J. Nat Biotechnol 1996;14.
The principle behind this is to look at differences in gene expression when variables are changed -
eg. Yeast cells grown in the presence of EtOH- what genes are turned on or off in response to that change in the environment
Another variable could be normal versus diseased tissue
Pool the cDNAs
The cDNAs are hybridized to microarrays on which every gene that has been cloned is present [the DNA is spotted on the microslides and each spot corresponds to DNA from a different gene]
If a particulatr gene is expressed, then it will be present and labelled in the the cDNA pool. It can then hybridize to the spot of the plate corresponding to that particular gene
The results from such an experiment look like this where the color of the spot tells you something about that gene expression and drug therapy optimization.
The data can then be analyzed and sorted into tables that show which genes are expressed in response to the stimulus and which are turned off
This sort of experiment can be done with any collection of RNAs that you want to compare- particularly useful to compare ‘normal’ to mutant/disease state- eg. tells you what genes are turned on in cancerous cells, may give you a clue as to how cancer works
-Examples of factors showing variability that could be detected on arrays
-Provide information of status of SNPs and gene expression profiles
The promise is to deliver “personalized” medicine