1 / 26

Authors: Lan Liu & Tao Jiang, Univ. California, Riverside

Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree. Authors: Lan Liu & Tao Jiang, Univ. California, Riverside Jing Xiao, Lirong Xia, Tsinghua Univ. , China. Outline.

bin
Download Presentation

Authors: Lan Liu & Tao Jiang, Univ. California, Riverside

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree Authors: Lan Liu & Tao Jiang, Univ. California, Riverside Jing Xiao, Lirong Xia, Tsinghua Univ. , China

  2. Outline • Introduction and problem definition • A new system of linear equations for ZRHC • An O(mn3) time algorithm for ZRHC • An improved algorithm for ZRHC • Conclusion

  3. Pedigree • An example: British Royal Family

  4. Example: Mendelian experiment Biological Background • Mendelian Law: one haplotype comes from the father and the other comes from the mother. • Basic concepts maternal paternal 11 22:homozygous 12:heterozgyous 1|2 2|1

  5. 1111 2222 2222 2222 1111 2222 2222 2222 Father Father Mother Mother 2222 1111 1122 2222 1222 1122 2122 2222 : recombinant Child Child 1 recombinant 0 recombinant Haplotype Configuration Genotype Notations and Recombinant

  6. 1 2 1 2 2 1 1 2 1 2 1 2 (b) Haplotype Configuration Reconstruction • Haplotypes: useful, but expensive to obtain • Genotypes: not so informative, but cheaper to obtain • In biological application, genotypes instead of haplotypes are collected. • How to reconstruct haplotype from genotype? • recombination-free assumption

  7. The ZRHC problem • Problem definition • Given a pedigree and the genotype information for each member, find a recombination-freehaplotype configuration for each member that obeys the Mendelian law of inheritance.

  8. Previous Work • Li and Jiang introduced a system of linear equations over F[2] and presented an time algorithm for ZRHC [LJ03] , where m is #lociand n is #members in pedigree. • Several attempts have been made recently, but the authors failed to prove the correctness of their algorithms in all cases, especially when the input pedigree has mating loops [CZ04] [LCL06]. • Recently, Chan et al. proposed a linear-time algorithm in [CCC+06], which only works for pedigree without mating loops.

  9. Related work • Methods based on fast matrix multiplication algorithms could achieve an asymptotic speed of O(k2.376) on k equations with k unknowns • The Lanczos and conjugate gradient algorithms are only heuristics [GV96]. • The Wiedeman algorithm has expected quadratic running time [W86]

  10. Our Result • We present a much faster algorithm for ZRHC with running time . O(n log2n log log n) O(n) redundancy elimination O(n) transformation Ax=b Ax=b Ax=b

  11. Outline • Introduction and problem definition • A new system of linear equations for ZRHC • An O(mn3) time algorithm for ZRHC • An improved algorithm for ZRHC • Conclusion Ax=b

  12. Unknowns • : thepaternal haplotype vector of a member j. • : the scalar demonstrating inheritance info between a parent j1and a child j. The New Linear System • n, m • m : #loci n: #members in pedigree

  13. j2 j1 j2 j1 Pj1,1 pj1,2 pj1,3pj1,4 Pj1,1+1pj1,2+0pj1,3+0pj1,4 +1 Pj2,1 pj2,2pj2,3pj2,4 Pj2,1+0pj2,2+1pj2,3+1pj2,4+1 0100 1101 0111 0000 Pj2 Pj2 +wj2 Pj1+wj1 Pj1 hj1,j hj2,j j j Pj,1 pj,2 pj,3 pj,4 Pj,1 +1pj,2 +1pj,3 +0pj,4 +0 1101 0 0 0 1 Pj+wj Pj The New Linear System pj1,2=1 pj1,3=0

  14. The Linear System • O(mn) equations on O(mn) unknowns. • Given a homozygous locus i on a member j (with a child j1), pj[i] and pj1[i] arepre-determined.

  15. Pedigree graph G 12 11 12 22 12 12 11 12 12 1 2 3 1 2 3 4 5 6 7 12 11 12 12 12 12 12 12 12 11 22 12 4 5 6 7 8 12 22 22 8 9 22 12 12 9 #edges · 2n Pedigree Graph • A pedigree with genotype

  16. 1 0 ? 1 2 3 h1,4 1 1 0 4 5 6 7 1 h6,8 8 0 h4,9 h8,9 1 9 (b) Locus graph Locus Graph • Locus graphGi Gi = (V, Ei), where Ei= {(k,j)| k is a parent of j, wk[i]=1} 12 22 11 1 2 3 4 5 6 7 12 12 11 12 12 8 Zero-weight 9 : 22 (a) Genotype info Example: Locus graph for the 3rd locus

  17. Outline • Introduction and problem definition • A new system of linear equations for ZRHC • An O(mn3) time algorithm for ZRHC • An improved algorithm for ZRHC • Conclusion O(n) transformation Ax=b Ax=b O(mn)

  18. (proof sketch) Assume the path in locus graph Gi connecting two pre-determinedvertices j0and jk . … dj1, j2 djk-1, jk dj0, j1 hjk-1, jk hj1, j2 hj0, j1 Pj1[i] Pj2[i] Pjk-1[i] Pjk[i] Pj0[i] Pj0[i] = Pj1[i] + dj0, j1 + hj0, j1 Pj1[i] = Pj2[i] + dj1, j2 + hj1, j2 Pj2[i] = Pj3[i] + dj2, j3 + hj2, j2 … Pjk-1[i] = Pjk[i] + djk-1, jk + hjk-1, jk a constant An Observation • For any cycle or any path in a locus graph connecting two pre-determined vertices, the summation of h-variables along the path is a constant. We can use paths to denote constraints!

  19. 0 ? ? ? ? ? 1 2 3 1 2 3 h3,5 h3,5 h3,6 h2,4 h3,6 h2,5 h2,5 h2,6 1 ? 1 ? ? ? ? ? 4 5 6 7 4 5 6 7 h6,8 h4,9 : 8 8 1 1 0 0 9 9 (b) 2nd locus graph h3,5 + h3,6 + h2,5 + h2,6 = 0 (c) 3rd locus graph h4,9 + h2,4 + h2,5 + h3,5 + h3,6 + h6,8 = 0 Examples of Linear Constraints ? 1 0 1 2 3 1 1 0 1 4 5 6 7 h6,8 8 0 h8,9 1 9 (a) 1st locus graph h6,8 + h8,9= 1

  20. Linear Constraints • Obviously, the linear constraints are necessary. We can also show that these constraints are sufficient. • Moreover, we can upper bound #constraints in each locus graph as O(n), while the trivial analysis gives an upper bound O(n2). • Total #constraints = O(mn).

  21. Traditional method • Solve h-variables and p-variables together • O(mn)equations onO(mn)unknowns: O(mn)p-variablesandO(n)h-variables. • Our method • Solve h-variables and p-variables separately • O(mn) linear equations on O(n)h-variables. The ZRHC-PHASE algorithm Algorithm ZRHC_PHASE input: a pedigree G=(V,E) and genotype{gj} output: a general solution of {pj} begin Step 1. Preprocessing Step 2. Linear constraint generation on h-variables Step 3. Solve h-variables by Gaussian Elimination Step 4. Solve the p-variables by propagation from pre-determined p-variables to others. end

  22. Outline • Introduction and problem definition • A new system of linear equations for ZRHC • An O(mn3) time algorithm for ZRHC • An improved algorithm for ZRHC • Conclusion O(n) redundancy elimination O(n) transformation Ax=b Ax=b Ax=b O(mn) O(n log2n log log n)

  23. Key lemma Redundant Equation Elimination • An observation j0 j1 • Given a cycle , assume that there are constraints among each pair of vertices. • Originally, there are O(k2) constraints. Notice that they are not independent. • However, we can replace the original constraints by an equivalent set of constraints with size O(k). j2 jk … jk-2 jk-1 j0~j2 j2~jk-1 j0~jk-1 Remove the redundant equations without solving them!

  24. Redundant Equation Elimination • Given a spanning tree, the stretch of an edge (k, j) is defined as the length of the unique path between k and j on the tree. • Elkin, Emeky, Spielman and Teng shows that we can embed any graph in a low-stretch spanning tree with averagestretch O(log2n log log n). • The number of irredundant constraints can be bounded by the sum of cycle lengths, which is further bounded by the sumof stretches O(nlog2n log log n).

  25. Conclusion • We present an efficient algorithm for ZRHC with running time O(mn2+n3log2n log log n). • It remains interesting if the time complexity for ZRHC on general pedigrees can be improved to O(mn2+n3) or lower. • Another open question is how to use the algorithm to get haplotype configurations on pedigrees that require only a small (constant) number of recombinants

  26. Thanks for your time and attention!

More Related