1 / 46

CSE280b: Population Genetics

CSE280b: Population Genetics. Vineet Bafna/Pavel Pevzner. www.cse.ucsd.edu/classes/sp05/cse291. Simulating population data. Generate a coalescent (Topology + Branch lengths) For each branch length, drop mutations with rate  Generate sequence data

dobry
Download Presentation

CSE280b: Population Genetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner www.cse.ucsd.edu/classes/sp05/cse291 Vineet Bafna

  2. Simulating population data • Generate a coalescent (Topology + Branch lengths) • For each branch length, drop mutations with rate  • Generate sequence data • Note that the resulting sequence is a perfect phylogeny. • Given such sequence data, can you reconstruct the coalescent tree? (Only the topology, not the branch lengths) • Also, note that all pairs of positions are correlated (should have high LD). Vineet Bafna

  3. Coalescent with Recombination • An individual may have one parent, or 2 parents Vineet Bafna

  4. ARG: Coalescent with recombination • Given: mutation rate , recombination rate , population size 2N (diploid), sample size n. • How can you generate the ARG (topology+branch lengths) efficiently? • How will you generate sequences for n individuals? • Given sequence data, can you reconstruct the ARG (topology) Vineet Bafna

  5. Recombination • Define r as the probability of recombining per generation. • Assume k individuals in a generation. The following might happen: • An individual arises because of a recombination event between two individuals (It will have 2 parents). • Two individuals coalesce. • Neither (Each individual has a distinct parent). • Multiple events (low probability). Vineet Bafna

  6. Recombination • We ignore the case of multiple (> 1) events in one generation • Pr (No recombination) = 1-kr • Pr (No coalescence) • Consider scaled time in units of 2N generations. Thus the number of individuals increase with rate kr2N, and decrease with rate • The value 2rN is usually small, and therefore, the process will ultimately coalesce to a single individual (MRCA) Vineet Bafna

  7. ARG What is the flaw in this procedure? • Let k = n, • Define • Iterate until k= 1 • Choose time from an exponential distribution with rate • Pick event as recombination with probability • If event is recombination, choose an individual to recombine, and a position, else choose a pair to coalesce. • Update k, and continue Vineet Bafna

  8. Ancestral Recombination Graph Vineet Bafna

  9. Simulating sequences on the ARG • Generate topology and branch lengths as before • For each recombination, generate a position. • Next generate mutations at random on branch lengths • For a mutation, select a position as well. • Generate Sequence data. • Program called ms (Hudson) is a commonly used coalescent simulator Vineet Bafna

  10. Coalescent theory applications • Coalescent simulations allow us to test various hypothesis. The coalescent/ARG is usually not inferred, unlike in phylogenies. Vineet Bafna

  11. Coalescent theory: example • Ex: ~1400bp at Sod locus in Dros. • 10 taxa • 5 were identical. The other 5 had 55 mutations. • Q: Is this a chance event, or is there selection for this haplotype. Vineet Bafna

  12. Coalescent application • 10000 coalescent simulations were performed on 10 taxa. • 55 mutations on the coalescent branches • Count the number of times 5 lineages are identical • The event happened in 1.1% of the cases. • Conclusion: selection, or some other mechanism explains this data. Vineet Bafna

  13. Coalescent example: Out of Africa hypothesis • Looking at lineage specific mutations might help discard the candelabra model. How? • How do we decide between the multi-regional and Out-of-Africa model? How do we decide if the ancestor was African? Vineet Bafna

  14. Human Samples • We look at data from human samples • Gabriel et al. Science 2002. • 3 populations were sampled at multiple regions spanning the genome • 54 regions (Average size 250Kb) • SNP density 1 over 2Kb • 90 Individuals from Nigeria (Yoruban) • 93 Europeans • 42 Asian • 50 African American Vineet Bafna

  15. Population specific recombination • D’ was used as the measure between SNP pairs. • SNP pairs were classified in one of the following • Strong LD • Strong evidence for recombination • Others (13% of cases) • This roughly favors out-of-africa. A Coalescent simulation can help give confidence values on this. Gabriel et al., Science 2002 Vineet Bafna

  16. Haplotype Blocks • A haplotype block is a region of low recombination. • Define a region as a block if less than 5% of the pairs show strong recombination • Much of the genome is in blocks. • Distribution of block sizes vary across populations. Vineet Bafna

  17. Testing Out-of-Africa • Generate simulations with and without migration. • Check size of haplotype blocks. • Does it vary when migrations are allowed? • When the ‘new’ population has a bottleneck? • If there was a bottleneck that created European and Asian populations, can we say anything about frequency of alleles that are ‘African specific’? • Should they be high frequency, or low frequency in African populations? Vineet Bafna

  18. Haplotype Block: implications • The genome is mostly partitioned into haplotype blocks. • Within a block, there is extensive LD. • Is this good, or bad, for association mapping? Vineet Bafna

  19. Coalescent reconstruction • Reconstructing likely coalescents Vineet Bafna

  20. Re-constructing history in the absence of recombination Vineet Bafna

  21. An algorithm for constructing a perfect phylogeny • We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later. • In any tree, each node (except the root) has a single parent. • It is sufficient to construct a parent for every node. • In each step, we add a column and refine some of the nodes containing multiple children. • Stop if all columns have been considered. Vineet Bafna

  22. Inclusion Property • For any pair of columns i,j • i < j if and only if i1 j1 • Note that if i<j then the edge containing i is an ancestor of the edge containing i i j Vineet Bafna

  23. Example r A B C D E 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 Initially, there is a single clade r, and each node has r as its parent Vineet Bafna

  24. Sort columns • Sort columns according to the inclusion property (note that the columns are already sorted here). • This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 Vineet Bafna

  25. Add first column • In adding column i • Check each edge and decide which side you belong. • Finally add a node if you can resolve a clade 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 r u B D A C E Vineet Bafna

  26. Adding other columns • Add other columns on edges using the ordering property 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 r 1 3 E 2 B 5 4 D A C Vineet Bafna

  27. Unrooted case • Important point is that the perfect phylogeny condition does not change when you interchange 1s and 0s at a column. • Switch the values in each column, so that 0 is the majority element. • Apply the algorithm for the rooted case. • Homework: show that this is a correct algorithm Vineet Bafna

  28. Population Sub-structure Vineet Bafna

  29. Population sub-structure can increase LD Pop. A 0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 p1=0.1 q1=0.9 P11=0.1 D=0.01 Pop. B 1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 p1=0.9 q1=0.1 P11=0.1 D=0.01 • Consider two populations that were isolated and evolving independently. • They might have different allele frequencies in some regions. • Pick two regions that are far apart (LD is very low, close to 0) Vineet Bafna

  30. Recent ad-mixing of population • If the populations came together recently (Ex: African and European population), artificial LD might be created. • D = 0.15 (instead of 0.01), increases 10-fold • This spurious LD might lead to false associations • Other genetic events can cause LD to arise, and one needs to be careful 0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 Pop. A+B p1=0.5 q1=0.5 P11=0.1 D=0.1-0.25=0.15 1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 Vineet Bafna

  31. Determining population sub-structure • Given a mix of people, can you sub-divide them into ethnic populations. • Turn the ‘problem’ of spurious LD into a clue. • Find markers that are too far apart to show LD • If they do show LD (correlation), that shows the existence of multiple populations. • Sub-divide them into populations so that LD disappears. Vineet Bafna

  32. Determining Population sub-structure 0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 • Same example as before: • The two markers are too similar to show any LD, yet they do show LD. • However, if you split them so that all 0..1 are in one population and all 1..0 are in another, LD disappears Vineet Bafna

  33. Iterative algorithm for population sub-structure • Define • N = number of individuals (each has a single chromosome) • k = number of sub-populations. • Z  {1..k}N is a vector giving the sub-population. • Zi=k’ => individual i is assigned to population k’ • Xi,j = allelic value for individual i in position j • Pk,j,l = frequency of allele l at position j in population k Vineet Bafna

  34. Example • Ex: consider the following assignment • P1,1,0 = 0.9 • P2,1,0 = 0.1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 Vineet Bafna

  35. Goal • X is known. • P, Z are unknown. • The goal is to estimate Pr(P,Z|X) • Various learning techniques can be employed. • maxP,Z Pr(X|P,Z) (Max likelihood estimate) • maxP,Z Pr(X|P,Z) Pr(P,Z) (MAP) • Sample P,Z from Pr(P,Z|X) • Here a Bayesian (MCMC) scheme is employed to sample from Pr(P,Z|X). We will only consider a simplified version Vineet Bafna

  36. Algorithm:Structure • Iteratively estimate • (Z(0),P(0)), (Z(1),P(1)),.., (Z(m),P(m)) • After ‘convergence’, Z(m) is the answer. • Iteration • Guess Z(0) • For m = 1,2,.. • Sample P(m) from Pr(P | X, Z(m-1)) • Sample Z(m) from Pr(Z | X, P(m)) • How is this sampling done? Vineet Bafna

  37. Example • Choose Z at random, so each individual is assigned to be in one of 2 populations. See example. • Now, we need to sample P(1) from Pr(P | X, Z(0)) • Simply count • Nk,j,l = number of people in pouplation k which have allele l in position j • pk,j,l = Nk,j,l / N 1 2 2 1 1 2 1 2 1 2 1 2 2 1 1 2 1 2 2 1 0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 Vineet Bafna

  38. Example • Nk,j,l = number of people in population k which have allele l in position j • pk,j,l = Nk,j,l / Nk,j,* • N1,1,0 = 4 • N1,1,1 = 6 • p1,1,0 = 4/10 • p1,2,0 = 4/10 • Thus, we can sample P(m) 1 2 2 1 1 2 1 2 1 2 1 2 2 1 1 2 1 2 2 1 0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 Vineet Bafna

  39. Sampling Z • Pr[Z1 = 1] = Pr[”01” belongs to population 1]? • We know that each position should be in linkage equilibrium and independent. • Pr[”01” |Population 1] = p1,1,0 * p1,2,1 =(4/10)*(6/10)=(0.24) • Pr[”01” |Population 2] = p2,1,0 * p2,2,1 = (6/10)*(4/10)=0.24 • Pr [Z1 = 1] = 0.24/(0.24+0.24) = 0.5 Assuming, HWE, and LE Vineet Bafna

  40. Sampling • Suppose, during the iteration, there is a bias. • Then, in the next step of sampling Z, we will do the right thing • Pr[“01”| pop. 1] = p1,1,0 * p1,2,1 = 0.7*0.7 = 0.49 • Pr[“01”| pop. 2] = p2,1,0 * p2,2,1 =0.3*0.3 = 0.09 • Pr[Z1 = 1] = 0.49/(0.49+0.09) = 0.85 • Pr[Z6 = 1] = 0.49/(0.49+0.09) = 0.85 • Eventually all “01” will become 1 population, and all “10” will become a second population 1 1 1 2 1 2 1 2 1 1 2 2 2 1 2 2 1 2 2 1 0 .. 1 0 .. 1 0 .. 0 1 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 0 .. 1 1 .. 0 1 .. 0 0 .. 0 1 .. 1 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 1 .. 0 Vineet Bafna

  41. Allowing for admixture • Define qi,k as the fraction of individual i that originated from population k. • Iteration • Guess Z(0) • For m = 1,2,.. • Sample P(m),Q(m) from Pr(P,Q | X, Z(m-1)) • Sample Z(m) from Pr(Z | X, P(m),Q(m)) Vineet Bafna

  42. Estimating Z (admixture case) • Instead of estimating Pr(Z(i)=k|X,P,Q), (origin of individual i is k), we estimate Pr(Z(i,j,l)=k|X,P,Q) i,1 i,2 j Vineet Bafna

  43. Results on admixture prediction Vineet Bafna

  44. Results: Thrush data • For each individual, q(i) is plotted as the distance to the opposite side of the triangle. • The assignment is reliable, and there is evidence of admixture. Vineet Bafna

  45. Population Structure • 377 locations (loci) were sampled in 1000 people from 52 populations. • 6 genetic clusters were obtained, which corresponded to 5 geographic regions (Rosenberg et al. Science 2003) Oceania Eurasia East Asia America Africa Vineet Bafna

  46. Population sub-structure:research problem • Systematically explore the effect of admixture. Can admixture be predicted for a locus, or for an individual • The sampling approach may or may not be appropriate. Formulate as an optimization/learning problem: • (w/out admixture). Assign individuals to sub-populations so as to maximize linkage equilibrium, and hardy weinberg equilibrium in each of the sub-populations • (w/ admixture) Assign (individuals, loci) to sub-populations Vineet Bafna

More Related