1 / 34

Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs

Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs. Tamar Barzuza 1 Jacques S. Beckmann 2,3 Ron Shamir 4 Itsik Pe’er 5 1 Computer Science and Applied Mathematics, Weizmann Institute of Science 2 Molecular Genetics, Weizmann Institute of Science

suchi
Download Presentation

Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs Tamar Barzuza1Jacques S. Beckmann2,3 Ron Shamir4 Itsik Pe’er5 1Computer Science and Applied Mathematics, Weizmann Institute of Science 2Molecular Genetics, Weizmann Institute of Science 3Génétique Médicale, Universitätsspital Lausanne 4School of Computer Science, Tel- Aviv University 5Medical and Population Genetics Group, Broad Institute

  2. Overview • Introduction • Xor PPH • Theoretical outlines and results • Experimental results • Informative SNPs • Theoretical results • Summary and Future research

  3. Chromosomes

  4. G A C A AATATATCGCTATCCGTATACCTAATTGGGGGTGTGTGTACGTAATGCTAGCACGCGCGCCAGGATTAGCTGCCACA T A C T AATATATCGCTTTCCGTATACCTAATTTGGGGTGTGTGTACGTAATGCTAGCACGCGCGCCAGGATTAGCTGCCACA T C C T AATATATCGCTTTCCGTATACCTAATTTGGGGTGTGTGTACGTACTGCTAGCACGCGCGCCAGGATTAGCTGCCACA A T C T AATATATCGCTATCCGTATACCTAATTTGGGGTGTGTGTACGTACTGCTAGCACGCGCGCTAGGATTAGCTGCCACA A G C T AATATATCGCTATCCGTATACCTAATTGGGGGTGTGTGTACGTACTGCTAGCACGCGCGCTAGGATTAGCTGCCACA A G C T AATATATCGCTATCCGTATACCTAATTGGGGGTGTGTGTACGTACTGCTAGCACGCGCGCTAGGATTAGCTGCCACA SNP – Single nucleotide polymorphism

  5. G A C A T A C T T C C T A T C T A G C T A G C T SNP – Single nucleotide polymorphism

  6. 1 2 3 4 1 G 0 A 1 C 1 A T 0 0 A C 1 0 T 0 T 1 C 1 C T 0 A 1 0 T C 1 T 0 A 1 1 G 1 C T 0 A 1 1 G C 1 0 T Haplotypes, Genotypes and XOR-Genotypes Haplotypes: A G A C T T A C Genotype: A/T T/G A C XOR-Genotype: Het Het Hom Hom

  7. 1 2 3 4 G 1 A 0 1 C 1 A T 0 A 0 1 C T 0 T 0 1 C C 1 0 T A 1 0 T C 1 0 T A 1 G 1 C 1 T 0 1 A G 1 1 C T 0 Haplotypes, Genotypes and XOR-Genotypes Haplotypes: 1 1 0 1 0 0 0 1 Genotype: 2 2 0 1 XOR-Genotype: {1, 2}

  8. 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 4: 1→0 1 1 0 0 0 5: 0→1 1: 1→0 1 0 0 1 1 0 0 0 1 0 2: 0→1 3: 0→1 2 3 1 0 1 0 0 1 1 0 0 0 Perfect Phylogeny SNPs only 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 0 1 0 1 0 0

  9. Previous work Haplotyping: haplotypes from genotypes: Input: Genotypes G={G1,…,Gn} on SNPs S={s1,…,sm} Output: Find the haplotypes H={H1,…,H2n} that gave rise to G • General heuristics: • Clark ’90 • Excoffier+Slatkin ‘95 • PPH: Perfect phylogeny haplotyping (ngenotypes, mSNPs): • Gusfield 2002 O(nm(n,m)) • Bafna et. al 2002 O(nm2) • Eskin et. al 2003 O(nm2) Graph Realization

  10. 1 3 2 3 2 1 Previous work The graph realization problem: Input: A hypergraphH=({1,…,m}, P) • P={P1,P2,…,Pn}, Pi{1,…,m} Goal: A treeT=(V,E) with E=Ns.tPilabels a path inT Input:{ {1,2}, {2,3} } Output: Tutte 1959 O(n2m), Gavril and Tamari 1983 O(nm2), Bixby and Wagner 1988 O(nm(n,m))

  11. Overview • Introduction • Xor PPH • Theoretical outlines and results • Experimental results • Informative SNPs • Theoretical results • Summary and Future research

  12. ? 1 1 0 1 0 0 0 1 0/1 0/1 0 1 {1, 2} ? 0 1 0 1 0 0 0 0 0 0/1 0 0/1 {2, 4} ? 0 1 1 1 0 0 0 0 0 0/1 0/1 0/1 {2, 3, 4} ? 1 1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 ? {1, 2, 4} 0/1 0/1 0 0/1 {1} 0/1 1 0 0 XPPH - Xor perfect phylogeny haplotyping Xor-haplotyping: haplotypes from xor-genotypes: Input: 1. Xor-genotype data (can be obtained by DHPLC) 2. Three genotypes Goal: Resolve the haplotypes and their perfect phylogeny Xor-genotypes genotypes haplotypes

  13. XPPH - Xor perfect phylogeny haplotyping Xor-haplotyping: haplotypes from xor-genotypes: Input: 1. Xor-genotype data (can be obtained by DHPLC) 2. Three genotypes Goal: Resolve the haplotypes and their perfect phylogeny Xor-genotypes genotypes ? 0/1 0/1 0 1 {1, 2} haplotypes ? 0 0/1 0 0/1 {2, 4} ? 0 0/1 0/1 0/1 {2, 3, 4} ? {1, 2, 4} 0/1 0 0/1 0/1 ? {1} 0/1 1 0 0

  14. XPPH - Xor perfect phylogeny haplotyping Strategy:1. Input: Xor-genotype data Goal: Find the perfect phylogeny 2. Additional Input: 3 genotypes Goal: Find haplotypes Step 1: Xor-genotype = {Het SNPs} = A path in the perfect phylogeny • Build a tree from its paths  Graph realization Input reduction: Merge SNPs that are equivalent in the xor-data Proof: Unique graph realization solution  A perfect phylogeny

  15. GREAL We implemented Gavril & Tamari’s algorithm (83) for graph realization: O(m2n) • Find graph realization or determine that none exists • Count num of graph realization solutions for data • Stable and fast • Available at http://www.cs.tau.ac.il/~rshamir/greal/ Simulations • Simulate data of n individuals using Hudson 2002 • Remove all SNPs with <5% minor allele frequency • Apply GREAL: Is there a single solution? • Repeat 5000 times for each n

  16. Results The percentage of single solutions vs sample size

  17. The percentage of single solutions vs sample size R.H. Chung and D. Gusfield 2003 Results

  18. 1 1 Xor-genotypes 3 0 0 0 3 1 0 0 {1, 2} {1, 3} {2, 3} 2 2 1 0 1 0 0 1 1 1 0 0 1 0 XPPH • Perfect phylogeny • Haplotypes Step 2 ? Resolution up to bit flipping : gives the haplotypes structure

  19. 1 3 Genotype 2 1 2 2 1 x x 1 x x XPPH • Perfect phylogeny • Haplotypes Step 2 Xor-genotypes {1, 2} {1, 3} {2, 3} 0 x x SNP #1 homozygous  Can infer SNP #1 for all haplotypes  Need individuals with xor-genotypes (={het SNPs}) = 

  20. Theorem:xor-genotypes= there are three xor-genotypes with empty intersection Proof: ! xor-genotypes are tree paths (ow: NP-hard) (1) The intersection of two tree paths is an interval

  21. X1 (Proof) (2) Pick X1 arbitrarily, takeX1 X2,X1 X3,… X1Xn

  22. X1 (Proof) (2) Pick X1 arbitrarily, takeX1 X2,X1 X3,… X1Xn

  23. X1 (Proof) (2) Pick X1 arbitrarily, takeX1 X2,X1 X3,… X1Xn (3) XLends first,XR begins last XR X1 XL

  24. (Proof) (2) Pick X1 arbitrarily, takeX1 X2,X1 X3,… X1Xn (3)XLends first,XR begins last XR XR XL X1 X1 XL

  25. XR XL X1 (Proof) (2) Pick X1 arbitrarily, takeX1 X2,X1 X3,… X1Xn X1XLXR= XR X1 XL XR XL X1

  26. XR XL X1 • Find 3 individuals to genotype in O(nm) • Resolve the haplotypes XR X1 XL XR XL X1

  27. Overview • Introduction • Xor PPH • Theoretical outlines and results • Experimental results • Informative SNPs • Theoretical results • Summary and Future research

  28. Informative SNPs SNPs 1 2 3 4 5 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 0 Input: 1. Haplotypes H={H1,…,Hn} on SNPs S={s1,…,sm} 2. A set of interesting SNPsS"S Output:Minimal setSS\S"that distinguishes the same haplotypes as S" Haplotypes 4 3 2 1 Informative SNPs (Bafna et al. 2003): Not perfect phylogeny: NP-hard (MINIMUM TEST SET) Perfect phylogeny, 1 interesting SNP: O(nm), Bafna et al. 2003

  29. Informative SNPs SNPs 1 2 3 4 5 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 0 Input: 1. Haplotypes H={H1,…,Hn} on SNPs S={s1,…,sm} 2. A set of interesting SNPsS"S 3. A perfect phylogeny for H. 4. A cost functionC:SR+. Output:SS\S"with minimal costthat distinguishes the same haplotypes as S" Haplotypes 4 3 2 1 Informative SNPs: Generalization of prev def

  30. We find informative SNPs set • Of minimal cost • For any number of interesting SNPs • In O(m) • By a dynamic programming algorithm that climbs up the perfect phylogeny tree • We prove that the definition of informative SNPs generalizes to a more practical definition • Under the perfect phylogeny model, informative SNPs on genotypes and haplotypes are equivalent

  31. Summary • Xor-haplotyping: • Definition • Resolve haplotypes given xor-data and 3 genotypes in O(nm(m,n)) • Implementation • Experimental results • Selection of tag SNPs: • Generalize to • arbitrary cost • many interesting SNPs • Find optimal informative SNPs set in O(m) time • Combinatorial observation allows practical uses

  32. Future research • Relax the strong assumption of perfect phylogeny • Deal with data errors and missing data • Obtain empirical results for the theoretical work on informative SNPs • Preliminary results show that blocks of up to 600 SNPs are distinguishable by ~20 informative SNPs

  33. 1 1 0 0 1 1 0 0 1 0 1 0 0 1 01 2 2 2 10 10 01 2 2 2 1 1 0 0 1 0 1 0 1 0 1 1 0 0 Haplotype Pair 1 Theorem: All genotypes are distinct within a block Proof: Assume to the contrary equivalency of two: Genotype 1 Genotype 2 Haplotype Pair 2

More Related