equilibria in populations n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Equilibria in populations PowerPoint Presentation
Download Presentation
Equilibria in populations

Loading in 2 Seconds...

play fullscreen
1 / 49

Equilibria in populations - PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on

Equilibria in populations. Population data. Recall that we often study a population in the form of a SNP matrix Rows correspond to individuals (or individual chromosomes), columns correspond to SNPs The matrix is binary (why?)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Equilibria in populations' - gloria


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
population data
Population data
  • Recall that we often study a population in the form of a SNP matrix
    • Rows correspond to individuals (or individual chromosomes), columns correspond to SNPs
    • The matrix is binary (why?)
    • The underlying genealogy is hidden. If the span is large, the genealogy is not a tree any more. Why?

0100

0011

0011

0011

1000

1000

1000

basic principles
Basic Principles
  • In a ‘stable’ population, the distribution of alleles obeys certain laws
    • Not really, and the deviations are interesting
  • HW Equilibrium
    • (due to mixing in a population)
  • Linkage (dis)-equilibrium
    • Due to recombination

Vineet Bafna

hardy weinberg equilibrium
Hardy Weinberg equilibrium
  • Assume a population of diploid individuals;
  • Consider a locus with 2 alleles, A, a
  • Define p(respectively, q): the frequency of A (resp. a) in the population
  • 3 Genotypes: AA, Aa, aa
  • Q: What is the frequency of each genotypes
  • If various assumptions are satisfied, (such as random mating, no natural selection), Then
  • PAA=p2
  • PAa=2pq
  • Paa=q2

Vineet Bafna

hardy weinberg why
Hardy Weinberg: why?
  • Assumptions:
    • Diploid
    • Sexual reproduction
    • Random mating
    • Bi-allelic sites
    • Large population size, …
  • Why? Each individual randomly picks his two parental chromosomes. Therefore, Prob. (Aa) = pq+qp = 2pq, and so on.

0100

0011

0011

0011

1000

1000

1000

0100

1000

Vineet Bafna

hardy weinberg generalizations
Hardy Weinberg: Generalizations
  • Multiple alleles with frequencies
    • By HW,
  • Multiple loci?

Vineet Bafna

hardy weinberg implications
Hardy Weinberg: Implications
  • The allele frequency does not change from generation to generation. True or false?
  • It is observed that 1 in 10,000 caucasians have the disease phenylketonuria. The disease mutation(s) are all recessive. What fraction of the population carries the mutation?
  • Males are 100 times more likely to have the ‘red’ type of color blindness than females. Why?
  • Individuals homozygous for S have the sickle-cell disease. In an experiment, the ratios A/A:A/S: S/S were 9365:2993:29. Is HWE violated? Is there a reason for this violation?
  • A group of individuals was chosen in NYC. Would you be surprised if HWE was violated?
  • Conclusion: While the HW assumptions are rarely satisfied, the principle is still important as a baseline assumption, and significant deviations are interesting.

Vineet Bafna

a modern example of hw application
A modern example of HW application
  • SNP-chips can give us the allelic value at each polymorphic site based on hybridization.
  • Typically, the sites sampled have common SNPs
    • Minor allele frequency > 10%
  • We simply plot the 3 genotypes at each locus on 3 separate horizontal lines.
  • What is peculiar in the picture?
  • What is your conclusion?

Vineet Bafna

linkage equilibrium
Linkage Equilibrium
  • HW equilibrium is the equilibrium in allele frequencies at a single locus.
  • Linkage equilibrium refers to the equilibrium between allele occurrences at two loci.
    • LE is due to recombination events
    • First, we will consider the case where no recombination occurs.

HWE

what if there were no recombinations
What if there were no recombinations?
  • Life would be simpler
  • Each individual sequence would have a single parent (even for higher ploidy)
  • True for mtDNA (maternal) , and y-chromosome (paternal)
  • The genealogical relationship is expressed as a tree. This principle can be used to track ancestry of an individual

Vineet Bafna

phylogeny
Phylogeny
  • Recall that we often study a population in the form of a SNP matrix
    • Rows correspond to individuals (or individual chromosomes), columns correspond to SNPs
    • The matrix is binary (why?)
    • The underlying genealogy is hidden. If the span is large, the genealogy is not a tree any more. Why?

0100

0011

0011

0011

1000

1000

1000

reconstructing perfect phylogeny
Reconstructing Perfect Phylogeny
  • Input: SNP matrix M (n rows, m columns/sites/mutations)
  • Output: a tree with the following properties
    • Rows correspond to leaf nodes
    • We add mutations to edges
    • each edge labeled with i splits the individuals into two subsets.
      • Individuals with a 1 in column i
      • Individuals with a 0 in column i

i

1 in position i

0 in position i

Vineet Bafna

reconstructing perfect phylogeny1
Reconstructing perfect phylogeny

12345678

01000000

00110110

00110100

00110000

10000000

10001000

10001001

  • Each mutation can be labeled by the column number
  • Goal is to reconstruct the phylogeny
    • genographic atlas

2

7

6

3

4

5

1

8

an algorithm for constructing a perfect phylogeny
An algorithm for constructing a perfect phylogeny
  • We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later.
  • In any tree, each node (except the root) has a single parent.
    • It is sufficient to construct a parent for every node.
  • In each step, we add a column and refine some of the nodes containing multiple children.
  • Stop if all columns have been considered.

Vineet Bafna

columns
Columns
  • Define i0: taxa (individuals) with a 0 at the i’th column
  • Define i1: taxa (individuals) with a 1 at the i’th column

Vineet Bafna

inclusion property
Inclusion Property
  • For any pair of columns i,j, one of the folowing holds
    • i1 j1
    • j1 i1
    • i1 j1 = 
  • For any pair of columns i,j
    • i < j if and only if i1 j1
  • Note that if i<j then the edge containing i is an ancestor of the edge containingj

i

j

Vineet Bafna

perfect phylogeny via example

r

A

B

C

D

E

Perfect Phylogeny via Example

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

Initially, there is a single clade r, and each node has r as its parent

Vineet Bafna

sort columns
Sort columns
  • Sort columns according to the inclusion property (note that the columns are already sorted here).
  • This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

Vineet Bafna

add first column
Add first column

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

  • In adding column i
    • Check each individual and decide which side you belong.
    • Finally add a node if you can resolve a clade

r

u

B

D

A

C

E

Vineet Bafna

adding other columns
Adding other columns

1 2 3 4 5

A 1 1 0 0 0

B 0 0 1 0 0

C 1 1 0 1 0

D 0 0 1 0 1

E 1 0 0 0 0

  • Add other columns on edges using the ordering property

r

1

3

E

2

B

5

4

D

A

C

Vineet Bafna

unrooted case
Unrooted case
  • Important point is that the perfect phylogeny condition does not change when you interchange 1s and 0s at a column.
  • Alg (Unrooted)
    • Switch the values in each column, so that 0 is the majority element.
    • Apply the algorithm for the rooted case.
    • Relabel columns and individuals.
  • Show that this is a correct algorithm.

Vineet Bafna

unrooted perfect phylogeny
Unrooted perfect phylogeny
  • We transform matrix M to a 0-major matrix M0.
  • if M0 has a directed perfect phylogeny, M has a perfect phylogeny.
  • If M has a perfect phylogeny, does M0 have a directed perfect phylogeny?
unrooted case1
Unrooted case
  • Theorem: If M has a perfect phylogeny, there exists a relabeling, and a perfect phylogeny s.t.
    • Root is all 0s
    • For any SNP (column), #1s <= #0s
    • All edges are mutated 01

Vineet Bafna

proof
Proof
  • Consider the perfect phylogeny of M.
  • Find the center: none of the clades has greater than n/2 nodes.
    • Is this always possible?
  • Root at one of the 3 edges of the center, and direct all mutations from 01 away from the root. QED
  • If the theorem is correct, then simply relabeling all columns so that the majority element is 0 is sufficient.

Vineet Bafna

homework problems
‘Homework’ Problems
  • What if there is missing data? (An entry that can be 0 or 1)?
  • What if there are recurrent mutations?

Vineet Bafna

slide29
Quiz
  • Recall that a SNP data-set is a ‘binary’ matrix.
    • Rows are individual (chromosomes)
    • Columns are alleles at a specific locus
  • Suppose you have 2 SNP datasets of a contiguous genomic region but no other information
    • One from an African population, and one from a European Population.
    • Can you tell which is which?
    • How long does the genomic region have to be?

Vineet Bafna

linkage dis equilibrium ld
Linkage (Dis)-equilibrium (LD)
  • Consider sites A &B
  • Case 1: No recombination
  • Each new individual chromosome chooses a parent from the existing ‘haplotype’

A B

0 1

0 1

0 0

0 0

1 0

1 0

1 0

1 0

1 0

Vineet Bafna

linkage dis equilibrium ld1
Linkage (Dis)-equilibrium (LD)
  • Consider sites A &B
  • Case 2: diploidy and recombination
  • Each new individual chooses a parent from the existing alleles

A B

0 1

0 1

0 0

0 0

1 0

1 0

1 0

1 0

1 1

Vineet Bafna

linkage dis equilibrium ld2
Linkage (Dis)-equilibrium (LD)
  • Consider sites A &B
  • Case 1: No recombination
  • Each new individual chooses a parent from the existing ‘haplotype’
    • Pr[A,B=0,1] = 0.25
      • Linkage disequilibrium
  • Case 2: Extensive recombination
  • Each new individual simply chooses and allele from either site
    • Pr[A,B=(0,1)]=0.125
      • Linkage equilibrium

A B

0 1

0 1

0 0

0 0

1 0

1 0

1 0

1 0

Vineet Bafna

slide33
LD
  • In the absence of recombination,
    • Correlation between columns
    • The joint probability Pr[A=a,B=b] is different from P(a)P(b)
  • With extensive recombination
    • Pr(a,b)=P(a)P(b)

Vineet Bafna

measures of ld
Measures of LD
  • Consider two bi-allelic sites with alleles marked with 0 and 1
  • Define
    • P00 = Pr[Allele 0 in locus 1, and 0 in locus 2]
    • P0* = Pr[Allele 0 in locus 1]
  • Linkage equilibrium if P00 = P0* P*0
  • D = (P00 - P0* P*0) = -(P01 - P0* P*1) = …

Vineet Bafna

other measures of ld
Other measures of LD
  • D’ is obtained by dividing D by the largest possible value
    • Suppose D = (P00 - P0* P*0) >0.
    • Then the maximum value of Dmax= min{P0* P*1, P1* P*0}
    • If D<0, then maximum value is max{-P0* P*0, -P1* P*1}
    • D’ = D/ Dmax

0

1

-D

D

0

Site 1

-D

D

1

Site 2

Vineet Bafna

other measures of ld1
Other measures of LD
  • D’ is obtained by dividing D by the largest possible value
    • Ex: D’ = abs(P11- P1* P*1)/ Dmax
  •  = D/(P1* P0* P*1 P*0)1/2
  • Let N be the number of individuals
  • Show that 2N is the 2 statistic between the two sites

0

1

P00N

0

P0*N

Site 1

1

Site 2

Vineet Bafna

digression the 2 test
Digression: The χ2 test

0

1

  • The statistic behaves like a χ2 distribution (sum of squares of normal variables).
  • A p-value can be computed directly

O1

O2

0

O3

O4

1

observed and expected
Observed and expected

0

1

0

1

P0*P*1N

P00N

0

P0*P*0N

P01N

Site 1

1

P10N

P11N

P1*P*0N

P1*P*1N

  •  = D/(P1* P0* P*1 P*0)1/2
  • Verify that 2N is the 2 statistic between the two sites
ld over time and distance
LD over time and distance
  • The number of recombination events between two sites, can be assumed to be Poisson distributed.
  • Let r denote the recombination rate between two adjacent sites
  • r = # crossovers per bp per generation
  • The recombination rate between two sites l apart is rl
ld over time
LD over time
  • Decay in LD
    • Let D(t) = LD at time t between two sites
    • r’=lr
    • P(t)00 = (1-r’) P(t-1)00 + r’ P(t-1)0* P(t-1)*0
    • D(t) =P(t)00 - P(t)0* P(t)*0 = P(t)00 - P(t-1)0* P(t-1)*0 (Why?)
    • D(t) =(1-r’) D(t-1) =(1-r’)t D(0)

Vineet Bafna

ld over distance
LD over distance
  • Assumption
    • Recombination rate increases linearly with distance and time
    • LD decays exponentially.
  • The assumption is reasonable, but recombination rates vary from region to region, adding to complexity
  • This simple fact is the basis of disease association mapping.

Vineet Bafna

ld and disease mapping
LD and disease mapping
  • Consider a mutation that is causal for a disease.
  • The goal of disease gene mapping is to discover which gene (locus) carries the mutation.
  • Consider every polymorphism, and check:
    • There might be too many polymorphisms
    • Multiple mutations (even at a single locus) that lead to the same disease
  • Instead, consider a dense sample of polymorphisms that span the genome

Vineet Bafna

ld can be used to map disease genes
LD can be used to map disease genes

LD

  • LD decays with distance from the disease allele.
  • By plotting LD, one can short list the region containing the disease gene.

D

N

N

D

D

N

0

1

1

0

0

1

Vineet Bafna

slide44
269 individuals
    • 90 Yorubans
    • 90 Europeans (CEPH)
    • 44 Japanese
    • 45 Chinese
  • ~1M SNPs

Vineet Bafna

haplotype blocks
Haplotype blocks
  • It was found that recombination rates vary across the genome
    • How can the recombination rate be measured?
  • In regions with low recombination, you expect to see long haplotypes that are conserved. Why?
  • Typically, haplotype blocks do not span recombination hot-spots

Vineet Bafna

long haplotypes
Long haplotypes
  • Chr 2 region with high r2 value (implies little/no recombination)
  • History/Genealogy can be explained by a tree ( a perfect phylogeny)
  • Large haplotypes with high frequency

Vineet Bafna

19q13
19q13

Vineet Bafna

ld variation across populations
LD variation across populations

Reich et al.

Nature 411, 199-204(10 May 2001)

  • LD is maintained upto 60kb in swedish population, 6kb in Yoruban population

Vineet Bafna

understanding population data
Understanding Population Data
  • We described various population genetic concepts (HW, LD), and their applicability
  • The values of these parameters depend critically upon the population assumptions.
    • What if we do not have infinite populations
    • No random mating (Ex: geographic isolation)
    • Sudden growth
    • Bottlenecks
    • Ad-mixture
  • It would be nice to have a simulation of such a population to test various ideas. How would you do this simulation?

Vineet Bafna