Loading in 2 Seconds...
Loading in 2 Seconds...
Drosophila Population Genetics Brian Charlesworth Institute of Evolutionary Biology School of Biological Sciences University of Edinburgh. Why is intra-specific variability interesting?.
Institute of Evolutionary Biology
School of Biological Sciences
University of Edinburgh
A high degree of variability is obviously favourable, as freely giving the materials for selection to work on… Charles Darwin, The Origin of Species, Chap. 1.
Darwin was the first person to recognize clearly that evolutionary change over time is the result of processes acting on genetically controlled variability among individuals within a population, which eventually cause differences between ancestral and descendant populations.
Knowledge of the nature and causes of this variability is crucial for an understanding of the mechanisms of evolution, animal and plant breeding, and human genetic diseases.
Classical genetics reveals the existence of discrete polymorphisms in natural populations, but is necessarily limited either to chromosomal rearrangements such as inversions that can be detected cytologically, or to conspicuous phenotypes such as eye colour or body colour (flies carrying certain eye-colour mutations such as cardinal can be found in natural populations).
Within a given species, only a handful of such polymorphisms can easily be detected. Relatively few cases of discrete polymorphisms affecting morphological traits are known.
A human inversion
Most metric traits have a coefficient of variation (the ratio of the standard deviation to the mean) of 5-10%.
Measurements of the resemblances between relatives show that 20%-80% of the variance in such traits is typically due to genetic factors.
This type of variation is of great evolutionary, medical and economic significance, but measuring it does not tell us anything about the details of its genetic control (numbers of loci involved, frequencies of variant alleles, etc.).
The results of close inbreeding (e.g. by brother-sister matings) are:
1. Reduced mean performance of a set of inbred lines, with respect to traits like survival, fertility and growth rate.
2. Increased variability among lines, sometimes involving abnormalities caused by single gene mutations.
The solution to question (a) is to use the fact that genes correspond to stretches of DNA that code for proteins.
If either the protein sequence corresponding to a gene, or its DNA sequence, can be studied directly, then we can look at variation within the population without having to follow visible mutations, i.e. there is no need for prior knowledge of the existence of variation.
We can also look at variation in non-coding sequences.
The first steps were taken in the mid-1960s by Lewontin and Hubby, working in Chicago on the fruitfly Drosophila pseudoobscura, and by Harris in London, working on humans.
They used the technique ofgel electrophoresis of proteins to screen populations for variants in a large number of soluble proteins controlled by independent loci, mostly enzymes with well-established metabolic roles. The proteins were chosen purely because they could be studied easily.
An average D. pseudoobscura individual was estimated to be heterozygous at 13% of the 24 protein loci that had been studied by 1974 i.e. a random individual sampled from the population would be expected to have distinct maternal and paternal alleles at 13% of its protein-coding loci.
Much lower levels of heterozygosity (or gene diversity: the chance that two randomly chosen copies of a gene are different) were found in mammals, and much higher levels in bacteria.
However, it raised more questions than it answered. In particular, there were several biases in the data. Only soluble proteins could easily be studied, and amino-acid changes that do not affect the mobility of proteins on gels are not detected by electrophoresis.
Similarly, any changes in the DNA that do not affect the protein sequence go undetected.
The advent in the late 1970s of methods for cloning and sequencing of DNA meant that studies of natural variation could be carried out at the DNA level. This eliminates virtually all the possible biases in quantifying variability.
With the advent of PCR amplification for isolated specific regions of the DNA, and with relatively cheap automated sequencing, this is now the method most commonly used in surveys of variation.
Efforts are currently under way in D. melanogaster to scale this “resequencing” up to the whole genome level.
Kreitman sequenced 11 independent copies (alleles) of the Adh (alcohol dehydrogenase) gene of D. melanogaster, isolated from collections made around the world. He sequenced 2379 bases from each of these alleles, an heroic effort in those days.
Intron 1 Coding Region 3' Non-Transcr.
% Silent Sites
Segregating 1.7 6.7 0.6
No. Sites 654 765 767
No non-silent substitutions found (other than F/S): 39 are expected if variability were same as for silent sites.
Most variation that is detected in coding sequences (typically over 85% in Drosophila) thus involves synonymous variants. Non-coding region variation shows a similar level to synonymous variation.
These results suggest that most variation and evolution at the DNA level may be due to neutral or nearly neutral mutations, whose fate is controlled by genetic driftrather than selection, especially as much of the genome is non-coding, even in Drosophila.
It can be calculated from data on a sample of homologous DNA sequences, by determining the sum of the numbers of differences between all possible pairs of sequences.
The result is divided by the product of the number of sequences that were compared (this equals n(n-1)/2, if there are n independent alleles), and the number of bases studied.
The total number of pairwise differences between all 3 combinations of sequences is 1 + 3 + 4 = 8.
To get the pairwise diversity per site, we divide this by 3 times the number of sites, so that
By dividing S by the product of the number of bases in the sequence and the sum
a = 1 + 1/2 + 1/3 + ... + 1/(n -1)
we obtain a statistic called Watterson’sw.
In the example, we have S = 4, and a
w= 4/(30 x 1.5) = 0.089
The expected value of the pairwise diversity in the population is then given by:
q = 4Nem
where m is the neutral mutation rate per site, and Ne is the effective population size, which controls the rate of genetic drift.
The expected values of both p and ware equal to .
Rough average values over many genes for silent nucleotide are as follows:
For example, with m = 4 x 10-9, and q = 0.02, we obtain Ne = 1.25 x 106.
Drosophila effective population sizes are therefore very large.
One of the major goals of evolutionary genetics is to understand to what extent selection, as opposed to neutral forces of mutation and genetic drift, controls variation and evolution in DNA and protein sequences.
The methods for doing this often involves combining data on sequence divergence between species with data on polymorphism within species.
Directional and balancing selection are often collectively referred to as positive selection.
The simplest situation is when we have two homologous (aligned) DNA sequences from a pair of related species.
For the purpose of discussion, assume that all evolutionary change occurs by nucleotide substitutions, i.e. the sequence differences are caused entirely by one nucleotide base changing into another by mutation.
This is usually the case for coding sequences, since insertions or deletions cause disruption of functionality.
The total time separating a pair of sequences from the two
species is 2T
Under neutral evolution, K is expected to be equal to the mutation rate (m) times the divergence time between the two species, i.e.
K = 2 m T
The simplest way to understand this is to note that, under neutral evolution, the expected number of mutations that distinguish a pair of sequences is equal to the time separating them (2T) times the rate of mutation per unit time (m).
Nonsynonymous sites are usually used as the candidates for selection, but there is increasing use of defined types of non-coding sequences.
This comes from the fact that both K and q for
nonsynonymous variants are nearly always much
smaller than for synonymous and noncoding
(species 1: 18 loci) and D. pseudoobscura (species 2: 14 loci)
All values are percentages
Divergence (K) is measured between
D. miranda and D. affinis. (KSbetween mirpseudo is 3.5%)
L. Loewe et al. 2006 Genetics 172: 1079-1092.
P. Haddrillet al. 2005 Genome Biol. 6: R67. 1-8.
C. Ting et al. 1998 Science 282:1501-1504
Neutral divergence = 2Tm
Neutral diversity = 4Nem
50-nucleotide (nt) window, in steps of 10 nt, using all sites
region (mostly non-synonymous)
p or K
intraspecific polymorphism within D. simulans (p)
interspecific divergence (K)
P < 0.006
H. Malik & S. Henikoff
This fraction is of the order of 25%, a surprisingly high value.
N. Bierne & A. Eyre-Walker 2004 Mol. Biol. Evol. 21: 1350-1360.
J. Maynard Smith & J. Haigh 1974 Genet. Res. 12: 12-35.
A recent selective sweep is detectable if the time since selective substitution is sufficiently small (around 0.25Ne generations), but there is a lot of noise
variant frequency distributions
The spread of an advantageous mutation affects diversity very much like a bottleneck, but only on the region around the gene
One haplotype present, then new neutral variants occur
< w , negative Tajima’s D
Fixed advantageous mutation
One haplotype selected, then new neutral variants occur
< w , Tajima’s D < 0
on the neo-X chromosome
of D. miranda
D. Bachtrog 2003 Nat. Genet. 34: 215-219.
There is currently a lot of interest in using scans of variability across the genome, to look for patterns that suggest a recent selective sweep.
The hope is that this will lead to identification of the mutations that have been favoured by selection.
They must have adapted to their new environments. It should be possible to see which regions of the genome show evidence of selective sweeps.
The problem is that they have also gone through bottlenecks of small population size, which has similar effects to sweeps, but are distributed over the whole genome.
non-African and African populations of D. melanogaster
B. Harr et al. (2002) Proc. Natl. Acad. Sci. USA 99, 12949-12954
across the X chromosome of mel
(L. Ometto et al. 2005 M.B.E. 22: 2119-2130)
Q is the probability of getting as many as the observed number of
polymorphisms in the European sample on a bottleneck model
Empty and filled circles indicate sig. negative or positive Tajima’s D.
The simplest model that can be made is for a random-mating population with a large number of independently evolving sites
Each site has two alternatives: preferred and unpreferred codons (A versus a)
These differences are assumed to have accumulated in the two focal species since the split between them
We have been using three Drosophila species for this purpose:
D. miranda is used for the polymorphism study
D. pseudoobscura is a very close relative (less than 4% silent site divergence from miranda)
D. affinis is a more distant outgroup species (about 23% silent site divergence from the other two)
Codons were classified as preferred (P) versus unpreferred (U), using Akashi’s codon usage table for D. pseudoobscura.
P U U P
Fixed 19 12
Polymorphic 37 6
rpd 1.95 0.50
Ratio of rpd values = 3.9
C. Bartolomé et al. 2005 Genetics 169: 1495-1507
= upI0/(up I0 +v[1-p] I1)
I0 is the probability that a P U polymorphism is detected in the sample;
I1 is the probability of detecting a U P polymorphism;
p is the proportion of P codons in the sequence;
u and v are the mutation rates for P U and U P changes
= I0 /(I0 +I1e - )
i.e. the proportion of P U polymorphisms depends only on = 4Net.
This allows us to use maximum likelihood to estimate the value of and its approximate 95% confidence limits.
Fixations were assigned to the neo-X and neo-Y branches, subsequent to the neo-X/neo-Y split
P U 15 47 p = 0.014
U P 74
Bartolomé and Charlesworth2006 Genetics 174:2033-2044
On a Mantel-Haenszel test, there is a significant excess (p <0.001) of non-synonymous relative to silent polymorphisms on the neo-Y compared with the neo-X, indicating a relaxation of purifying selection on the neo-Y.