1 / 27

Genealogies of time structured data, an application on cave bear ancient DNA

Genealogies of time structured data, an application on cave bear ancient DNA. UMR 7625         Laboratoire d’écologie Paris 6/ENS. Frantz Depaulis. UMR 5534         Centre de Génétique Moléculaire et Cellulaire         Université Claude Bernard, Lyon I. Ludovic Orlando Catherine Hannï.

reidar
Download Presentation

Genealogies of time structured data, an application on cave bear ancient DNA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genealogies of time structured data, an application on cave bear ancient DNA UMR 7625         Laboratoire d’écologie Paris 6/ENS Frantz Depaulis UMR 5534         Centre de Génétique Moléculaire et Cellulaire         Université Claude Bernard, Lyon I Ludovic Orlando Catherine Hannï

  2. Outline of the presentation • Introduction: Gene genealogies • Results • .1 Simulation exploratory results • .2 Cave bear application • Conclusions

  3. -Coalescence- Wright Fisher Neutral model Assumptions • Selective neutrality (Ne s <<1) • Demography - Isolated panmictic Population, - Constant size N - Poisson Distribution of offspring P (1) - Same sampling time • Mutational, sequence data: infinite site model (ISM) - No recombination - Independent mutations - Constant mutation rate µ Along the sequence Across time - Each mutation affects a new nucleotide site

  4. -Coalescence- Genealogy of a gene sample Most recent common ancestor (MRCA) coalescence= common ancestor ancestral lineage gene sample

  5. -Coalescence- Coalescent Most recent common ancestor of the sample (MRCA) A G Common ancestor (CA) T C C neutral mutations G A C c d e a b f sample of “genes” / of individuals

  6. Exp( p ) t5: p=1/2N t4 t3 t2 t1 1°) Ages of the nodes a b c d e f -Coalescence- Constructing coalescents, additional assumption: n << N p = (n (n -1)/2)/2N

  7. MRCA A common ancestor (CA) G T C neutral mutations C G A C T T A A A C C A G G C -Coalescence- 2°) Topology of the tree Constructing-deconstructing coalescents t5: t4 t3 t2 t1 100 000 times gene sample a b c d e f neutral distribution of sequence polymorphism 3°) uniform distribution of mutations

  8. -Coalescence- Haplotype tests: simulations T parameters‡ : S =8 n =6 T A ... A G G A G C T A C C A G T C C 10 000 T G A C C Distribution of simulated H C simulations T T C density G T C C C C C T T T T A T G C C G G G A A A A A A G A A A C C C G C haplotype number K{ K = 5 K = 6 K = 4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 { 2 H= 1- S H f haplotype diversity H = 0.72 H = 0.78 H = 0.83 i observed H : P = 0.03 * Depaulis and Veuille MBE 1998 ‡ Hudson 1993

  9. C T → GCCCGCGAATCCATT GCGTGCGATCCGATT GCGTACAATCCCGTC GTGTACAATCTCGAC GTGTACAATCTCGAC GCGTGGAATCCCGTT CCGCGCGGTCCCATT -Coalescence- Alignment of polymorphic sites: frequencies of mutations S =15 T C n =7 C GCGCGCGAACCCATT outgroup 121531416121423 frequencies

  10. -Coalescence- Frequency spectrum of mutations & neutrality tests Number of polymorphic sites q=4Ne m fi : number of occurrences in a sample H=qp-qH =0 (Tajima Genetics 1989) (Fu and Li Genetics 1993) (Fay and Wu Genetics 2000)

  11. Mitochondria, correlation LD/distance recombination or mutational effects? r 2 = ↘(d ) Pearson’s statistic tested by permutations of sites distance d Awadalla et al. (Science 1999)

  12. -Coalescence- Time structured data & genealogies - Parasites during disease evolution (virus…) - Microbial experimental evolution - Ancient DNA • Issue: • To what extent the analyses are affected by time structure? • How to correct for this?

  13. - Simulations- Algorithm for time structured coalescent n =2 n =3 n =4 n =2 n =5 d e f n1=3 n =3 t 1 a b c The exponential law is memoryless !

  14. - Simulations- Age structure effect on gene genealogies n1=4 Two subsets with large time spacing Contemporaneous sample t 1 Limited time structure Excess of rare variants Deficit of LD Deficit of rare variants Excess of LD Differentiation

  15. - Simulations- Effect of subset size on statistical tests : mean t1 =0.2 Ne generations n1 Dt D*fl Hfw ZnS K H Pearson Fst pi/pi0 S/S0 1.2 Mean 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 n1/n -0.2 -0.4 -0.6 n =40, S =20 Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to q); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst.

  16. - Simulations- Dt_inf D*fl_inf Hfw_inf ZnS_inf K_sup H_sup Fst 0.15 significance rate 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n1/n Effect of subset size on statistical tests : significance rate t1 =0.2 Ne generations n1 n =40, S =20 Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations

  17. - Simulations- Dt D*fl Hfw K H ZnS Pearson Fst Pi/Theta0 S/S0 3 Mean 2.5 2 1.5 1 0.5 0 -0.5 -1 0.001 0.01 0.1 1 10 t1 in 2 generations Ne Effect of a half subset age on statistical tests: mean n1=n/2 t 1 Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to q); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst.

  18. - Simulations- Dt_inf Dt_sup D*fl_inf D*fl_sup ZnS_inf K_sup H_sup Fst 0.35 Significance rate 0.3 0.25 0.2 0.15 0.1 0.05 0 t1 0.001 0.01 0.1 1 10 in 2 generations Ne Effect of a half subset age on statistical tests: significance rates n1=n/2 t 1 Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations

  19. - Application- Cave bear: Ursus spelaeus(12-300kYA)

  20. - Application- Sampling sites

  21. - Application- Alignment of polymorphic sites: D-loop of cave bear REF TTGTCAACTT TCGAATTGAA GT#NOASC3500_40-45 ..A....T.C ..A....... ..#NOASC3800_40-45 ..A....T.C ..A....... ..#NOASC85F16_40-45 .......... .......... ..#NOASC95456_40-45 ..A....T.C ..A....... ..#NOASC92386_40-45 ..A....T.C ..A....... ..#NOASC92413_40-45 C.A....T.C ..A....... ..#NOASC92152_40-45 C.A....T.C ..A....... A.#NOASC5300_50-60 ..A....T.C ..A....... ..#NOASC11600_80 .......... .......... ..#NOASC12500_80 .......... .......... ..#NOASC13800_80 .......... .......... ..#NOASC100801_80 .......... .......... ..#NOASC12400_80 ..A....T.C ..A....... ..#NOASC11800_80 .CA....T.C ..A.G..... ..#NOASC11700_80 C.A....T.C ..A....... A.#NOASC84E16_90-130 C.A....T.C ..A....... ..#NOASC84G19_90-130 C.A....T.C ..A....... ..#NOASCbrC5-02_90-130 C.A....T.C ..A....... ..#NOASC15400_90-130 C.A....T.C ..A......G ..#NOASC15700_90-130 ....T.G.C. .TA..C..G. ..#NOATAB2_40 .......... .......... ..#NOAGrotteMerve_? .......... .T........ ..#NOAAZE_80-130 .......... .......... .C#NOAGigny189F3_? ..A....T.C ..A....... ..#NOAJAL104_? C.A....T.C ..A....... ..#NOATAB15_25-35 ..A......C ..A....... ..#NOAGailenreuth_? ..A......C ..A....... ..#NOA47910_30 ..A....T.C ..A....A.. ..#NOAHohleFels_? ..A....T.C ..A..C.... ..#NOACLA_35 ..A....T.C C.A....... ..#NOACLB_35 ..A....T.C C.A....... ..#NOAChiemsee_35 ..A..G.... ..A...C... ..#NOARamesch1_? ..A..G.... ..A...C... ..#NOARamesch2_? ..A..G.... ..A...C... ..#NOAGeissenklt1_? ...CT..... .T.G.C.... ..#NOAGeissenklt2_? ...CT..... .T.G.C.... ..#NOANixloch_? ...CT..... .T...C.... .. --------------------------------------------- Alp barrier #SOAPoto_? ...CT..... .T...C.... ..#SOAVind1_? ...CT..... .T...C.... ..#SOAVind2_? ...CT..... .T...C.... ..#SOAConturi_? .......T.. .......... .. n =41 S =22 Ne= 13 000 (Loreille et al. 2001) (Orlando et al. 2002) (Hofreiter et al. 2002) (Kühn et al. 2001)

  22. - Application- Neutrality tests, Belgium cave * Statistic D D H K H Z Pearson t fl fw nS a Scladina Observed - 0.82 - 1.55 - 1.32 7 0.79 0.24 - 0.39 (2.8*) P No time (21.0) (5.3) (18.4) (16.4) (37.7) (43.7) (2.8*) ( value %) n =20 structure Mean 0.06 - 0.05 0.30 8.3 0.79 0.26 0.00 S =15 CI [ - 1.42;1.51] [ - 1 .89;1.18] [ - 4.46;2.62] [5;11] [0.64;0.88] [0.10;0.55] [ - 0.25;0.20] % rejected (4.9;5.5) (5.2;2.8) (5.4;4.8) (1.7;3.9) (4.9;4.6) (5.5;5.1) (5.0;/) Average P (30.0) (8.8) (17.2) (8.6) (31.2) (31.7) (2.7*) ( value %) time Mean - 0.30 - 0.38 0.3 9 9.1 0.80 0.22 0.00 structure CI [ - 1.56;1.26] [ - 1.89;0.84] [ - 4.04;2.56] [6;12] [0.66;0.89] [0.08;0.47] [ - 0.29;0.23] % rejected (7.8;3.0) (8.2;1.0) (4.2;3 . 7) (0.8;9.5) (3.3;7.8) (11.5;2.9) (4.9;/) P (30.0) (8.6) (17.4 ) (7.9) (30.9) (31.9) (2.8*) ( value %) Uncertainty Mean - 0.33 - 0.42 0.37 9.1 0.80 0.22 0.00 in time CI [ - 1.59;1.18] [ - 1.89;0.84] [ - 4.20;2.54] [6;12] [0.66;0.89] [0.08;0.48] [ - 0.29;0.24] structure (4.8;/) % rejected (9.3;2.8) (9.3;0.8) (4.5;3.6) (0.7;9.8) (3.7;7.5) (11.6;2.8) a permutation test

  23. - Application- Neutrality tests, dated subsample * Statistic D D H K H Z Pearson t fl fw nS a all dated Observed - 1.21 - 2.28 - 0.69 12 0.86 0.14 - 0.27 (11.4) n No time P (10.5) (0.6**) (25.7) (16.5) (32.1) (24.3) (11.5) =27, ( value %) structure S Mean - 0.09 - 0.08 0.29 10.3 0.82 0.23 0.00 =20 CI [ - 1.49; 1.50] [ - 1.98;1.32] [ - 5.66;3.18] [7;14] [0.69;0.90] [0.09;0.48] [ - 0.19;0.16] % rejected (5.0;5.2) (3.6;1.4) (5.3;4.7) (4.0;2.8) (5.3;4.7) (5.7;5.0) (4.7;/) Average P (17.7) (1.7*) (24.3) (38.2) (42.6) (41.8) (11.2) ( value %) time structure Mean - 0.4 2 - 0.59 0.35 11.8 0.84 0.18 0.00 CI [ - 1.69;1.11] [ - 2.28;0.72] [ - 5.34;2.98] [8;15] [0.71;0.91] [0.07;0.39] [ - 0.23;0.20] % rejected (9.3;2.1) (6.9;0.3) (4.7;2.6) (1.2;11.1) (3.4;9.5) (13.7;2.4) (4.9;/) Uncertainty P (18.5) (1.9*) (23.4) (39.9) (43.2) (41.1) (11.9) ( value %) in time Mean - 0.44 - 0.61 0.37 11.8 0.84 0.18 0.00 structure CI [ - 1.70;1.09] [ - 2.28;0.72] [ - 5.23;2.99] [8;16] [0.71;0.91] [0.07;0.40] [ - 0.24;0.19] % rejected (9.3;2.4) (7.0;0.2) (4.6;2.7) (1.2;11.7) (3.5;9.7) (14.1;2.5) (5. 4;/) a permutation test

  24. - Application- Neutrality tests, total sample * Statistic D D H K H Z Pearson F t fl fw nS st a a n Observed - 0.45 - 0.88 1.35 17 0.91 0.10 - 0.09 (22.0) 0.32 (0.4**) =41, No time P (37.1) (14.7) (47.1) (1.7*) (3.7*) (18.1) (21.5) (0.4**) ( value %) S =22 structure Mean - 0.09 - 0.09 0.30 12.3 0.83 0.19 0.0 0 - 0.03 CI [ - 1.44;1.52] [ - 1.85;1.38] [ - 5.84;3.15] [8;16] [0.70;0.90] [0.07;0.41] [ - 0.20;0.17] [ - 0.38;0.27] % rejected (4.5;5.3) (4.1;1.1) (4.8;4.7) (3.0;4.3) (4.8;4.9) (5.5;4.6) (4.8;/) (/;4.6) Average P (45.5) (35.6) (45.6 ) (7.8) (5.5) (36.6) (21.8) (1.3*) ( value %) time Mean - 0.45 - 0.74 0.32 13.9 0.84 0.15 0.00 - 0.01 structure CI [ - 1.71;1.10] [ - 2.49;0.73] [ - 5.38;2.93] [9;18] [0.71;0.91] [0.05;0.34] [ - 0.23;0.20] [ - 0.40;0.38] % rejected (10.2;2.2) (10.7;0.1) (4.2;2.4) (0.8;16.1) (4.3;7.9) ( 15.2;2.2) (4.9;/) (/;8.9) Uncertainty P (42.1) (40.7) (44.9) (10.3) (6.2) (39.2) (21.8) (1.7*) ( value %) in time Mean - 0.54 - 0.90 0.26 14.3 0.84 0.14 0.00 - 0.01 structure CI [ - 1.76;0.96] [ - 2.81;0.73] [ - 5.70;2.90] [10;18] [0.71;0.91] [0.05;0.32] [ - 0 .24;0.21] [ - 0.40;0.41] % rejected (12.2;1.4) (14.2;0.1) (4.5;2.3) (0.5;19.8) (4.0;7.9) (16.7;2.1) (4.7;/) (/;9.7) a permutation test

  25. 1 2 R = 0.4174 2 r 0.1 0.01 0 10 20 30 40 50 60 70 distance (nt) - Application- LD as a function of distance

  26. Time structure , Conclusion • Can substantially bias the results • Even if within 10% of the age of the MRCA bottom of the tree with more branches non random subset of mutations (rare ones) • small: long external branches, excess of rare variants (negative D, deficit of LD) • great: a long internal branch apparent differentiation excess of intermediate frequency variants (positive D, excess of LD) if equilibrated

  27. Acknowledgements • CNRS • Nick Barton

More Related