1 / 61

Inferring human demographic history from DNA sequence data

Inferring human demographic history from DNA sequence data. Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF. Standard model of human evolution. Standard model of human evolution (Origin and spread of genus Homo ). 2 – 2.5 Mya.

jsthilaire
Download Presentation

Inferring human demographic history from DNA sequence data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF

  2. Standard model of human evolution

  3. Standard model of human evolution(Origin and spread of genus Homo) 2 – 2.5 Mya

  4. Standard model of human evolution(Origin and spread of genus Homo) ? ? 1.6 – 1.8 Mya

  5. Standard model of human evolution(Origin and spread of genus Homo) 0.8 – 1.0 Mya

  6. Standard model of human evolutionOrigin and spread of ‘modern’ humans 150 – 200 Kya

  7. Standard model of human evolutionOrigin and spread of ‘modern’ humans ~ 100 Kya

  8. Standard model of human evolutionOrigin and spread of ‘modern’ humans 40 – 60 Kya

  9. Standard model of human evolutionOrigin and spread of ‘modern’ humans 15 – 30 Kya

  10. Estimating demographic parameters • How can we quantify this qualitative scenario into an explicit model? • How can we choose a model that is both biologically feasible as well as computationally tractable? • How do we estimate parameters and quantify uncertainty in parameter estimates?

  11. Estimating demographic parameters • Calculating full likelihoods (under realistic models including recombination) is computationally infeasible • So, compromises need to be made if one is interested in parameter estimation

  12. African populations 10 populations 229 individuals

  13. African populations Mandenka (bantu) 61 autosomal loci ~ 350 Kb sequence data Biaka (pygmies) San (bushmen)

  14. A simple model of African population history T g1 m g2 Biaka (or San) Mandenka

  15. Estimation method We use a composite-likelihood method (cf. Plagnol and Wall 2006) that uses information from the joint frequency spectrum such as: Numbers of segregating sites Numbers of shared and fixed differences Tajima’s D FST Fu and Li’s D*

  16. Estimation method We use a composite-likelihood method (cf. Plagnol and Wall 2006) that uses information from the joint frequency spectrum such as: Numbers of segregating sites Numbers of shared and fixed differences Tajima’s D FST Fu and Li’s D*

  17. Estimating likelihoods Pop1 Pop2

  18. Estimating likelihoods Pop 1 private polymorphisms Pop1 Pop2

  19. Estimating likelihoods Pop 1 private polymorphisms Pop 2 private polymorphisms Pop1 Pop2

  20. Estimating likelihoods Pop 1 private polymorphisms Pop 2 private polymorphisms Shared polymorphisms Pop1 Pop2

  21. Estimation method We use a composite-likelihood method (cf. Plagnol and Wall 2006) that uses information from the joint frequency spectrum such as: Numbers of segregating sites Numbers of shared and fixed differences Tajima’s D FST Fu and Li’s D*

  22. Estimating likelihoods We assume these other statistics are multivariate normal. Then, we run simulations to estimate the means and the covariance matrix. This accounts (in a crude way) for dependencies across different summary statistics.

  23. Composite likelihood We form a composite likelihood by assuming these two classes of summary statistics are independent from each other We estimate the (composite)-likelihood over a grid of values of g1, g2, T and M and tabulate the MLE. We also use standard asymptotic assumptions to estimate confidence intervals

  24. Estimates (with 95% CI’s) Parameter Man-Bia Man-San g1 (000’s)0 (0 – 3.8) 0 (0 – 3.8) g2 (000’s)4 (0 – 7.9) 2 (0 – 11) T(000’s) 450 (300 – 640) 100 (77 – 550) M (= 4Nm) 10 (8.4 – 12) 3 (2.2 – 4)

  25. Fit of the null model How well does the demographic null model fit the patterns of genetic variation found in the actual data?

  26. Fit of the null model How well does the demographic null model fit the patterns of genetic variation found in the actual data? Quite well. The model accurately reproduces both parameters used in the original fitting (e.g., Tajima’s D in each population) as well as other aspects of the data (e.g., estimates of ρ = 4Nr)

  27. Estimates (with 95% CI’s) Parameter Man-Bia Man-San g1 (000’s)0 (0 – 3.8) 0 (0 – 3.8) g2 (000’s)4 (0 – 7.9) 2 (0 – 11) T(000’s) 450 (300 – 640) 100 (77 – 550) M (= 4Nm) 10 (8.4 – 12) 3 (2.2 – 4)

  28. Population growth population size time

  29. Population growth population size time spread of agriculture and animal husbandry?

  30. Estimates (with 95% CI’s) Parameter Man-Bia Man-San g1 (000’s)0 (0 – 3.8) 0 (0 – 3.8) g2 (000’s)4 (0 – 7.9) 2 (0 – 11) T(000’s) 450 (300 – 640) 100 (77 – 550) M (= 4Nm) 10 (8.4 – 12) 3 (2.2 – 4)

  31. Ancestral structure in Africa At face value, these results suggest that population structure within Africa is old, and predates the migration of modern humans out of Africa. Is there any evidence for additional (unknown) ancient population structure within Africa?

  32. Model of ancestral structure Archaic human population T g1 m g2 Biaka (or San) Mandenka

  33. Standard model of human evolutionOrigin and spread of ‘modern’ humans ~ 100 Kya

  34. Admixture mapping Modern human DNA Neandertal DNA

  35. Admixture mapping Modern human DNA Neandertal DNA

  36. Admixture mapping Modern human DNA Neandertal DNA

  37. Admixture mapping Modern human DNA Neandertal DNA

  38. Admixture mapping Modern human DNA Neandertal DNA Orange chunks are ~10 – 100 Kb in length

  39. Genealogy with archaic ancestry time Modern humans Archaic humans present

  40. Genealogy without archaic ancestry time Modern humans Archaic humans present

  41. Our main questions • What pattern does archaic ancestry produce in DNA sequence polymorphism data (from extant humans)? • How can we use data to • estimate the contribution of archaic humans to the modern gene pool (c)? • test whether c > 0?

  42. Genealogy with archaic ancestry(Mutations added) time Modern humans Archaic humans present

  43. Genealogy with archaic ancestry(Mutations added) time Modern humans Archaic humans present

  44. Patterns in DNA sequence data Sequence 1 A T C C A C A G C T G Sequence 2 A G C C A C G G C T G Sequence 3 T G C G G T A A C C T Sequence 4 A G C C A C A G C T G Sequence 5 T G T G G T A A C C T Sequence 6 A G C C A T A G A T G Sequence 7 A G C C A T A G A T G

  45. Patterns in DNA sequence data Sequence 1 A T C C A C A G C T G Sequence 2 A G C C A C G G C T G Sequence 3 T G C G G T A A C C T Sequence 4 A G C C A C A G C T G Sequence 5 T G T G G T A A C C T Sequence 6 A G C C A T A G A T G Sequence 7 A G C C A T A G A T G

  46. Patterns in DNA sequence data Sequence 1 A T C C A C A G C T G Sequence 2 A G C C A C G G C T G Sequence 3 T G C G G T A A C C T Sequence 4 A G C C A C A G C T G Sequence 5 T G T G G T A A C C T Sequence 6 A G C C A T A G A T G Sequence 7 A G C C A T A G A T G We call the sites in red congruent sites – these are sites inferred to be on the same branch of an unrooted tree

  47. Linkage disequilibrium (LD) LD is the nonrandom association of alleles at different sites. Low LD: AC High LD: A C A T A C ACA C A T A C G C G T G T G T G C G T G T G T High recombination Low recombination

  48. Measuring ‘congruence’ To measure the level of ‘congruence’ in SNP data from larger regions we define a score function S* = where S (i1, . . . ik) = and S (ij, ij+1) is a function of both congruence (or near congruence) and physical distance between ij and ij+1.

  49. An example

  50. An example (CHRNA4)

More Related