1 / 37

A Simplified View of DCJ-Indel Distance

A Simplified View of DCJ-Indel Distance. Phillip Compeau University of California-San Diego Department of Mathematics. Abstract. Braga et al., 2010: Solved problem of DCJ-indel sorting in linear time. Goals: “Hardwire” DCJ sorting into DCJ-indel sorting.

kaemon
Download Presentation

A Simplified View of DCJ-Indel Distance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Simplified View of DCJ-Indel Distance Phillip Compeau University of California-San Diego Department of Mathematics

  2. Abstract • Braga et al., 2010: Solved problem of DCJ-indel sorting in linear time. • Goals: • “Hardwire” DCJ sorting into DCJ-indel sorting. • Characterize solution space for DCJ-indel sorting. • DCJ solution space known (Braga and Stoye, 2010).

  3. Section 1: Preliminaries Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion

  4. The Discrete Genome • Genome (Π): formed of two matchings • genes g(Π): each numbered gene has a head and a tail. • adjacencies (a(Π)):a blue matching on V(g(Π)) Π Γ

  5. The Discrete Genome • Chromosome: component of Π (alternating path or cycle) • Linear or circular depending on path or cycle of Π • Telomere: path endpoint of Π; has null adjacency {v, Ø} Π Γ

  6. The Double-Cut-and-Join Operation • Double-cut-and-joinoperation (DCJ; Yancopoulos et al., 2005): “cuts” genome in two places and rejoins adjacencies. • DCJ Distance (dDCJ(Π, Γ)): minimum # of DCJs required to transform Π intoΓ (having the same genes).

  7. The DCJ Incorporates Many Operations

  8. The Breakpoint Graph • B(Π, Γ) is formed from the adjacencies of Π and Γ. • B(Π, Γ) also comprises (alternating) red-bluepaths and cycles.

  9. DCJ Distance Formula • Bergeron et al., 2006: If Π and Γ share the same genes, then the DCJ distance is given by the following formula: • N = # of genes • c(Π, Γ) = # of cycles in B(Π, Γ) • peven(Π, Γ) = # of even paths in B(Π, Γ)

  10. Indels and the DCJ-Indel Distance • Indel: The insertion or deletion of a chromosome or chromosomal interval (consecutive genes). • Assumption: we can’t remove a gene common to Π and Γ • DCJ-Indel Distance (dindDCJ(Π, Γ)): Minimum # of DCJs and indels required to transform Π into Γ. • Braga et al., 2010: Solve DCJ-indel sorting in linear time. • Lots of cases…can we simplify it? c b d a a b c d a c a b b Ø Ø Ø

  11. Section 2: Encoding Indels as DCJs Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion

  12. Deletion  DCJ Creating Circular Chromosome • Ma et al., 2009: View deletion as formation and removal of a circular chromosome. • Idea: Indel = DCJ creating circular chromosome • Wait…what about the deletion of circular chromosomes? c b d a a b c d a c a b b Ø Ø Ø DCJ DCJ DCJ b a b c a b a c d b DCJ a d c Ø

  13. Apparent Exceptions • Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ. c b d a b a c d DCJ 3 Operations

  14. Apparent Exceptions • Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ. c c b b d d a a b a c d 1 Operation DCJ 3 Operations

  15. Apparent Exceptions • Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ • Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ. • Question: Can we delete all circular singletons first?

  16. Apparent Exceptions • Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ • Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ. • Question: Can we delete all circular singletons first? YES!

  17. Handling Circular Singletons • Proposition: When transforming Π into Γ via a minimum collection of DCJs and indels, no gene belonging to a circular singleton of Π can ever appear in the same chromosome as a gene of Γ. • Corollary 1: If Π* is formed from Π by removing a circular singleton from Π, then dindDCJ(Π*, Γ) = dindDCJ(Π, Γ) – 1. • Let sing(Π, Γ) = # of circular singletons of Π and Γ. • Corollary 2: If Π0 and Γ0 are formed by removing all circular singletons from Π and Γ, thendindDCJ(Π, Γ) = dindDCJ(Π0 , Γ0) + sing(Π, Γ)

  18. A Novel View of DCJ-Indel Distance • WLOG we may henceforth assume that sing(Π, Γ) = 0. • A completion of Π is a genome Π’ such that: • g(Π’) = g(Π) U g(Γ) • a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π) • New chromosomes of Π’ are circular: the indels of Π’ • Theorem:

  19. A Novel View of DCJ-Indel Distance • An optimal completion achieves the optimum below. • A completion of Π is a genome Π’ such that: • g(Π’) = g(Π) U g(Γ) • a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π) • New chromosomes of Π’ are circular: the indels of Π’ • Theorem:

  20. Section 3: DCJ-Indel Sorting Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion

  21. Open Vertices • π-open vertex: vertex not found in Π (must be matched in Π’) • path endpoint in B(Π,Γ) must be π-open/γ-open or telomere (or both) • Define {π, π}-paths, {π, γ}-paths, π-paths in B(Π, Γ) • Idea: Construct B(Π*, Γ*) from B(Π, Γ) by matching vertices.

  22. Necessary Conditions for B(Π*, Γ*) • Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k– 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*).

  23. Necessary Conditions for B(Π*, Γ*) • Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k– 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). • Picture: π π π π π π dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’) Vs. π π Cycle B(Π’, Γ’) B(Π’’, Γ’)

  24. Necessary Conditions for B(Π*, Γ*) • Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k– 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). • Remaining components of B(Π*, Γ*): • bracelet: cycle linking {π, γ}-paths • chain: path linking π-paths/γ-paths via intermediate {π, γ}-paths π π π π π γ γ 3-Chain 2-Bracelet π π π 2-Chain γ γ

  25. Necessary Conditions for B(Π*, Γ*) • Lemma 2: B(Π*, Γ*) can contain only 2-bracelets, 2-chains, and 3-chains. • Picture: π π π π π π π π Vs. dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’) P1 P2 P1 P2 Cycle γ γ γ γ B(Π’, Γ’) B(Π’’, Γ’)

  26. Necessary Conditions for B(Π*, Γ*) • Lemma 3: B(Π*, Γ*) cannot have one 2-chain joining two odd π-paths and another 2-chain joining two even π-paths. The same holds for γ-paths. • Picture: Ø Ø P3 even Ø Ø P1 odd EvenPath π π π π dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’) Vs. π π π π P2 odd EvenPath P4 even Ø Ø Ø Ø B(Π’, Γ’) B(Π’’, Γ’)

  27. Sorting Algorithm • Remove all circular singletons of Π and Γ. • Lemma 1  Close every {π, π}-path ({γ, γ}-path) into a cycle by adding a single new adjacency to Π* (Γ*). • Form a maximum set of 2-bracelets (only chains remaining). • Form a maximum set of even 2-chains by linking pairs of π-paths (γ-paths) having opposite parity (Lemma 3). • If pπ, γ is odd, then link the remaining {π, γ}-path with any remaining π-path and γ-path. • Arbitrarily link pairs of remaining π-paths, all of which have the same parity. Do the same for any γ-paths remaining.

  28. DCJ-Indel Distance • Theorem: The preceding algorithm solves DCJ-indel sorting in linear time, and it implies a DCJ-indel distance formula: ind where δ = 1 only if pπ, γis odd and either: • pπodd > pπeven , pγodd> pγeven; or • pπodd < pπeven , pγodd < pγeven Otherwise, δ = 0.

  29. Section 4: The Solution Space of DCJ-Indel Sorting Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion

  30. Encompassing all Possible Cases • The solution space is known for DCJ-sorting (Braga and Stoye, 2010). • Thus, we only need to find all optimal completions, and the specific operations will fall out in the wash.

  31. Handling Circular Singletons • The circular singletons of Π must be removed in sing(Π) steps. We have two options: • Delete all the circular singletons of Π. • Perform k “fusion” DCJs followed by sing(Π) – kchromosome deletions. • This poses a straightforward (yet tedious) counting problem.

  32. Adding Necessary Conditions on B(Π*, Γ*) • Proposition 1: Every π-path embedding into a 3-chain of an optimal completion must have the same parity. • Proposition 2: If pπ, yis even, then B(Π*, Γ*) must contain a maximum collection of even 2-chains. • Proofs are slightly more involved…

  33. Finishing the Job • Four cases, depending on path statistics. • pπ, γ is odd: • pπodd > pπeven , pγodd > pγeven (or vice-versa); δ = 1 • pπodd> pπeven , pγodd < pγeven(or vice-versa); δ = 0 • pπ, γ is even: • pπodd > pπeven , pγodd > pγeven (or vice-versa); δ = 0 • pπodd> pπeven , pγodd < pγeven(or vice-versa); δ = 0 • These cases are tedious but straightforward and can be handled similarly.

  34. Section 5: Conclusion Preliminaries Encoding Indels as DCJs DCJ-Indel Sorting The Solution Space of DCJ-Indel Sorting Conclusion

  35. Future Work • Correspondence with Braga et al., 2010? • Varying the indel cost? • Charge indel cost ≤ DCJ cost, take minimum total cost. • Most of the simplifying sorting lemmas hold, but actually computing the minimum cost appears difficult in this model. • The problem is solved! (under framework of Braga et al., 2010)

  36. Questions?

  37. Shameless Plug • www.rosalind.info • A novel education website that teaches bioinformatics through programming exercises. • Have “professor” environment for assigning programming exercises to your bioinformatics classes.

More Related