1 / 25

The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis

The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis. Zo ë Abrams zoea@stanford.edu Ho-Lin Chen holin@stanford.edu. Restriction Site Analysis.

Download Presentation

The Simplified Partial Digest Problem: Hardness and a Probabilistic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Simplified Partial Digest Problem:Hardness and a Probabilistic Analysis Zoë Abrams zoea@stanford.edu Ho-Lin Chen holin@stanford.edu

  2. Restriction Site Analysis • An enzyme cuts a target DNA strand to into DNA fragments, and these DNA fragments are used to reconstruct the restriction site locations of the enzyme. • Two common Approaches • Double Digest Problem (NP-complete) [Goldstein, Waterman ’87] • Partial Digest Problem

  3. Partial Digest Problem • Reconstruct the locations using the length of all fragments that can possibly be produced. • The hardness of the problem is unknown. [Skiena, Sundaram ’93][Lemke, Skiena, Smith ’02] • Adding the primary fragments to the information used, we can find a unique reconstruction in polynomial time. [Pandurangan, Ramesh ’01] • Information is susceptible to experimental error caused by missing fragments.

  4. Simplified Partial Digest Problem • Proposed by Blazewicz et. Al. ’01 • Uses primary fragments and base fragments to reconstruct restriction sites • Primary fragments: One of the endpoints is the endpoint of the original DNA strand • Base fragments: two endpoints are consecutive sites on the DNA strand

  5. Problem Definition • Given • X0 = 0, Xn+1 = D • A set of base fragments {Xi - Xi-1}1  i  n+1 • A set of primary fragments {(Xn+1 - Xi) (Xi – X0)}1  i  n • Reconstruct the original series X1,...,Xn,

  6. Theoretical and Algorithmic Issues • The algorithm that finds the exact solution may take 2n time in the worst case. [Blazewicz, Jaroszewski ’03] • The Simplified Partial Digest Problem may have exponential number of solutions. • The problem is APX-hard. • Simple algorithms can give correct solution with high probability.

  7. Proof of APX-Hardness • We proved Simplified Partial Digest Problem is APX-hard by reducing the Tripartite-Matching problem to it. • Tripartite-Matching Problem: Given a set S of triples in {1,2,3..n}3 , |S|=T. Find whether there exists a subset M of S such that |M| = n, and no two triples in M are the same in some coordinates.

  8. Tripartite Matching Problem

  9. Tripartite Matching Problem

  10. Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments ……. 2T 1 2

  11. Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments Pairs of symmetric restriction sites …….

  12. Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments Pairs of symmetric restriction sites …….

  13. Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments Pairs of symmetric restriction sites …….

  14. Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments. • In each pair of equal-length segments, there are seven restriction sites that can be put on either side. ……. 2T 1 2 Sites “x" can be on either side

  15. Proof of APX-Hardness • Use symmetric restriction sites to cut the segment into 2T equal-length segments. • In each pair of equal-length segments, there are seven restriction sites that can be put on either side. ……. 2T 1 2 Sites “x" can be on either side

  16. Proof of APX-Hardness • Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively.

  17. Proof of APX-Hardness • Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively. • In each segment, restriction sites in the same group must be put on the same side.

  18. Proof of APX-Hardness • Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively. • In each segment, restriction sites in the same group must be put on the same side. • Each placement of restriction sites corresponds to a set of triples chosen in the Tripartite Matching Problem. not chosen chosen

  19. Proof of APX-Hardness • Those seven restriction sites can be divided into two groups, denoted by “o” and “x” respectively. • In each segment, restriction sites in the same group must be put on the same side. • Each placement of restriction sites corresponds to a set of triples chosen in the Tripartite Matching Problem. • The current placement of restriction sites is a solution iff the corresponding set of triples is a solution to the Tripartite Matching Problem.

  20. A Simple Algorithm • Put all symmetric points at correct locations • Put all asymmetric points on the left side

  21. A Simple Algorithm • Put all symmetric points at correct locations • Put all asymmetric points on the left side • From each site, do (from endpoints to the middle) • If the base segment is matched, fix its location

  22. A Simple Algorithm • Put all symmetric points at correct locations • Put all asymmetric points on the left side • From each site, do (from endpoints to the middle) • If the base segment is matched, fix its location • If the base segment isn’t matched, move it and all points toward middle to the other side.

  23. A Simple Algorithm • Put all symmetric points at correct locations • Put all asymmetric points on the left side • From each site, do (from endpoints to the middle) • If the base segment is matched, fix its location • If the base segment isn’t matched, move it and all points toward middle to the other side.

  24. Analysis of the Algorithm • Assuming a uniform distribution for restriction sites, for many practical parameters*, with probability at least 0.4 the algorithm outputs correct locations. • All the primary fragments are matched, and at least ¼ of all base fragments will be matched in the worst case. • Runs in time linear to the number of sites *Ex: Length of the DNA strand around 20,000, 10-20 restriction sites

  25. Future Work • Construct better heuristics to solve SPDP • Analyze the hardness of Partial Digest Problem • Find other characterizations of restriction sites that are both easy to measure and can be used to reconstruct the sites

More Related