1 / 43

Outline

Outline. Cancer Progression Models SNPs, Haplotypes, and Population Genetics: Introduction. Cancer: Mutation and Selection. Clonal theory of cancer: Nowell (Science 1976). Cancer Genomes. Leukemia. Breast. Tumor genome 2. Tumor genome 3. Tumor genome 4. “Comparative Genomics” of Cancer.

kbatchelor
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • Cancer Progression Models • SNPs, Haplotypes, and Population Genetics: Introduction

  2. Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976)

  3. Cancer Genomes Leukemia Breast

  4. Tumor genome 2 Tumor genome 3 Tumor genome 4 “Comparative Genomics” of Cancer Mutation, selection Human genome Tumor genome • Identify recurrent aberrations • Mitelman Database, >40,000 aberrations • Reconstruct temporal sequence of aberrations • Linear model:Colorectal cancer (Vogelstein, 1988): -5q  12p*  -17p  -18q • Tree model: (Desper et al.1999) • 3) Find age of tumor, • time of clonal expansion

  5. Tumor genome 2 Tumor genome 3 Tumor genome 4 Observing Cancer Progression • Obtaining longitudinal (time-course) data difficult. • Latitudinal data (multiple patients) readily available. t1 t2 t3 t4 Mutation, selection Human genome Tumor genome

  6. Multiple Mutations • 4 step model for colorectal cancer, Vogelstein, et al. (1988) New Eng. J.Med -5q  12p*  -17p  -18q • Inferred from latitudinal data in 172 tumor samples.

  7. Oncogenetic Tree models (Desper et al. JCB 1999, 2001) • Given: measurements of chromosome gain/loss events in multiple tumor samples (CGH) • Compute: rooted tree that best explains temporal sequence of events. {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q}

  8. Oncogenetic Tree models (Desper et al. JCB 1999, 2000) • Given: measurements of chromosome gain/loss events in multiple tumor samples {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q} L = set of chromosome alterations observed in all samples Tumor samples give probability distribution on 2L

  9. e1 e0 e2 e3 e4 Oncogenetic Tree T = (V, E, r, p, L) rooted tree • V = vertices • E = edges • L = set of events (leaves) • r root • p: E  (0,1] probability distribution T gives probability distribution on 2L

  10. Results • CGH of 117 cases of kidney cancer

  11. Extensions • Oncogenetic trees based on branching (Desper et al., JCB 1999)

  12. Extensions

  13. Extensions • Oncogenetic trees based on branching (Desper et al., JCB 1999) • Maximum Likelihood Estimation (von Heydebreck et al, 2004) • Mutagenic trees: mixtures of trees (Beerenwinkel, et al. JCB 2005)

  14. Heterogeneity within a tumor • Final tumor is clonal expansion of single cell lineage. • Can we date the time of clonal expansion? Tsao, … Tavare, et al. Genetic reconstruction of individual colorectal tumor histories, PNAS 2000.

  15. Estimating time of clonal expansion • Microsatellite loci (MS), CA dinucleotides. • In tumors with loss of mismatch repair (e.g. colorectal), MS change size.

  16. Estimating time of clonal expansion • For each MS locus, measure mean mi and variance si of size. • S2allele = average of s12, …, sL2 • S2loci = variance of m1, …, mL

  17. Time to clonal expansion?

  18. Simulation Estimates of Tumor Age Y2 Y1 • Y1 = time to clonal expansion • Tumor age = Y1 + Y2 • Branching process simulation. Each cell in population gives birth to 0, 1 or 2 daughter cells with +- 1 change in MS size (coalescent: forward, backward, forward simulation) • Posterior estimate of Y1, Y2 by running simulations, accepting runs with simulated values of S2allele, S2loci close to observed.

  19. Results • 15 patients, 25 MS loci • Estimate time since clonal expansion from observed S2allele, S2loci .

  20. Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976)

  21. Population Genetics • C.C. Maley: selective sweeps of mutations in tumor cell populations • Chin and Gray: solid tumors

  22. Genetics 101 • Humans are diploid: two copies of each chromosome, maternal and paternal • Locus: Region on a chromosome (gene, nucleotide, etc.) • Allele: “Value” at a locus • Genotype: Pair of alleles (maternal and paternal) at loci on a chromosome (homozygous, heterozygous) • Haplotype: Alleles of loci on same chromosome (maternal or paternal)

  23. Allele Measurement • “Old days” (< 1970?): gene variants • More recently: (1980’s-90’s), various sequence based genetic markers: microsatellites, sequence tagged sites (STS), etc. • Today: single nucelotide polymorphisms (SNPs)

  24. Single Nucleotide Polymorphisms Infinite Sites Assumption: Each site mutates at most once 00000101011 10001101001 01000101010 01000000011 00011110000 00101100110 By convention, SNPs are biallelic: only two of four possible nucleotides present in population

  25. Infinite Sites Assumption A B 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 5 8 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 • The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa. • Each sequence has single parent. • The history of a population can be expressed as a tree. • The tree can be constructed efficiently

  26. Infinite sites Assumption and Perfect Phylogeny • Each site is mutated at most once in the history. • All descendants must carry the mutated value, and all others must carry the ancestral value i 1 in position i 0 in position i

  27. Perfect Phylogeny • Assume an evolutionary model in which only mutation takes place, • The evolutionary history is explained by a tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny. • How can one reconstruct such a tree?

  28. The 4-gamete condition • A column i partitions the set of species into two sets i0, and i1 • A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogenous. • EX: i is heterogenous w.r.t {A,D,E} i A 0 B 0 C 0 D 1 E 1 F 1 i0 i1

  29. 4 Gamete Condition • 4 Gamete Condition • There exists a perfect phylogeny if and only if for all pair of columns (i,j), either j is not heterogenous w.r.t i0, or i1. • Equivalent to • There exists a perfect phylogeny if and only if for all pairs of columns (i,j), the following 4 rows do not exist (0,0), (0,1), (1,0), (1,1)

  30. i i0 i1 4-gamete condition: proof • Depending on which edge the mutation j occurs, either i0, or i1 should be homogenous. • (only if) Every perfect phylogeny satisfies the 4-gamete condition • (if) If the 4-gamete condition is satisfied, does a prefect phylogeny exist?

  31. An algorithm for constructing a perfect phylogeny • We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later. • In any tree, each node (except the root) has a single parent. • It is sufficient to construct a parent for every node. • In each step, we add a column and refine some of the nodes containing multiple children. • Stop if all columns have been considered.

  32. Inclusion Property • For any pair of columns i,j • i < j if and only if i1 j1 • Note that if i<j then the edge containing i is an ancestor of the edge containing j i j

  33. r A B C D E Example 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 Initially, there is a single clade r, and each node has r as its parent

  34. Sort columns • Sort columns according to the inclusion property (note that the columns are already sorted here). • This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0

  35. Add first column 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 • In adding column i • Check each edge and decide which side you belong. • Finally add a node if you can resolve a clade r u B D A C E

  36. Adding other columns 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 • Add other columns on edges using the ordering property r 1 3 E 2 B 5 4 D A C

  37. Unrooted case • Switch the values in each column, so that 0 is the majority element. • Apply the algorithm for the rooted case

  38. Summary :No recombination leads to correlation between sites A B 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 5 8 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 • The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa. • The history of a population can be expressed as a tree. • The tree can be constructed efficiently

  39. Haplotype Phasing Problem • Given a set of genotypes, infer the haplotypes. • Use parsimony assumption • Haplotypes satisfy perfect phylogeny (Gusfield) • Find minimum number of haplotypes that explain observed genotypes • Most sequencing technologies measure genotypes not haplotypes 0 1 0 1 1 1 0 1 1 0 0 0 1 0 2 1 0 2 2 1 0 Pair of haplotypes Genotype: 2 = heterozygous

  40. Recombination 00000000 11111111 00011111

  41. Recombination • A tree is not sufficient as a sequence may have 2 parents • Recombination leads to violation of 4 gamete property. • Recombination leads to loss of correlation between columns 00000000 11111111 00011111

  42. Studying recombination • A tree is not sufficient as a sequence may have 2 parents • Recombination leads to loss of correlation between columns • How can we measure recombination?

  43. Linkage (Dis)-equilibrium (LD) A B 0 0 0 1 0 0 0 0 1 1 1 0 1 0 1 0 A B 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 Extensive Recombination • Pr[A,B=(0,1)=0.125 • Linkage equilibrium • No recombination • Pr[A,B=0,1] = 0.25 • Linkage disequilibrium

More Related