- 142 Views
- Uploaded on

Download Presentation
## Population Genetics Basics

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Terminology review

- Allele
- Locus
- Diploid
- SNP

Single Nucleotide Polymorphisms

Infinite Sites Assumption:

Each site mutates at most once

00000101011

10001101001

01000101010

01000000011

00011110000

00101100110

What causes variation in a population?

- Mutations (may lead to SNPs)
- Recombinations
- Other genetic events (gene conversion)
- Structural Polymorphisms

Gene Conversion

- Gene Conversion versus crossover
- Hard to distinguish in a population

Structural polymorphisms

- Large scale structural changes (deletions/insertions/inversions) may occur in a population.

Topic 1: Basic Principles

- In a ‘stable’ population, the distribution of alleles obeys certain laws
- Not really, and the deviations are interesting
- HW Equilibrium
- (due to mixing in a population)
- Linkage (dis)-equilibrium
- Due to recombination

Hardy Weinberg equilibrium

- Consider a locus with 2 alleles, A, a
- p(respectively, q) is the frequency of A (resp. a) in the population
- 3 Genotypes: AA, Aa, aa
- Q: What is the frequency of each genotype

- If various assumptions are satisfied, (such as random mating, no natural selection), Then
- PAA=p2
- PAa=2pq
- Paa=q2

Hardy Weinberg: why?

- Assumptions:
- Diploid
- Sexual reproduction
- Random mating
- Bi-allelic sites
- Large population size, …
- Why? Each individual randomly picks his two chromosomes. Therefore, Prob. (Aa) = pq+qp = 2pq, and so on.

Hardy Weinberg: Generalizations

- Multiple alleles with frequencies
- By HW,
- Multiple loci?

Hardy Weinberg: Implications

- The allele frequency does not change from generation to generation. Why?
- It is observed that 1 in 10,000 caucasians have the disease phenylketonuria. The disease mutation(s) are all recessive. What fraction of the population carries the mutation?
- Males are 100 times more likely to have the “red’ type of color blindness than females. Why?
- Conclusion: While the HW assumptions are rarely satisfied, the principle is still important as a baseline assumption, and significant deviations are interesting.

What if there were no recombinations?

- Life would be simpler
- Each individual sequence would have a single parent (even for higher ploidy)
- The relationship is expressed as a tree.

The Infinite Sites Assumption

0 0 0 0 0 0 0 0

3

0 0 1 0 0 0 0 0

5

8

0 0 1 0 1 0 0 0

0 0 1 0 0 0 0 1

- The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa.
- Some phenotypes could be linked to the polymorphisms
- Some of the linkage is “destroyed” by recombination

Infinite sites assumption and Perfect Phylogeny

- Each site is mutated at most once in the history.
- All descendants must carry the mutated value, and all others must carry the ancestral value

i

1 in position i

0 in position i

Perfect Phylogeny

- Assume an evolutionary model in which no recombination takes place, only mutation.
- The evolutionary history is explained by a tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny.

The 4-gamete condition

- A column i partitions the set of species into two sets i0, and i1
- A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogenous.
- EX: i is heterogenous w.r.t {A,D,E}

i

A 0

B 0

C 0

D 1

E 1

F 1

i0

i1

4 Gamete Condition

- 4 Gamete Condition
- There exists a perfect phylogeny if and only if for all pair of columns (i,j), either j is not heterogenous w.r.t i0, or i1.
- Equivalent to
- There exists a perfect phylogeny if and only if for all pairs of columns (i,j), the following 4 rows do not exist

(0,0), (0,1), (1,0), (1,1)

4-gamete condition: proof

i

i0

i1

- Depending on which edge the mutation j occurs, either i0, or i1 should be homogenous.
- (only if) Every perfect phylogeny satisfies the 4-gamete condition
- (if) If the 4-gamete condition is satisfied, does a prefect phylogeny exist?

An algorithm for constructing a perfect phylogeny

- We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later.
- In any tree, each node (except the root) has a single parent.
- It is sufficient to construct a parent for every node.
- In each step, we add a column and refine some of the nodes containing multiple children.
- Stop if all columns have been considered.

Inclusion Property

- For any pair of columns i,j
- i < j if and only if i1 j1
- Note that if i<j then the edge containing i is an ancestor of the edge containing i

i

j

Example

r

A

B

C

D

E

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

Initially, there is a single clade r, and each node has r as its parent

Sort columns

- Sort columns according to the inclusion property (note that the columns are already sorted here).
- This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

Add first column

- In adding column i
- Check each edge and decide which side you belong.
- Finally add a node if you can resolve a clade

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

r

u

B

D

A

C

E

Adding other columns

- Add other columns on edges using the ordering property

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

r

1

3

E

2

B

5

4

D

A

C

Unrooted case

- Switch the values in each column, so that 0 is the majority element.
- Apply the algorithm for the rooted case

Handling recombination

- A tree is not sufficient as a sequence may have 2 parents
- Recombination leads to loss of correlation between columns

Linkage (Dis)-equilibrium (LD)

- Consider sites A &B
- Case 1: No recombination
- Pr[A,B=0,1] = 0.25
- Linkage disequilibrium
- Case 2:Extensive recombination
- Pr[A,B=(0,1)=0.125
- Linkage equilibrium

A B

0 1

0 1

0 0

0 0

1 0

1 0

1 0

1 0

Handling recombination

- A tree is not sufficient as a sequence may have 2 parents
- Recombination leads to loss of correlation between columns

Recombination, and populations

- Think of a population of N individual chromosomes.
- The population remains stable from generation to generation.
- Without recombination, each individual has exactly one parent chromosome from the previous generation.
- With recombinations, each individual is derived from one or two parents.
- We will formalize this notion later in the context of coalescent theory.

Download Presentation

Connecting to Server..