1 / 30

Comp. Genomics

Comp. Genomics. Recitation 13. Genome rearrangements Homework solutions. Exercise 1. Two haploid, single-chromosome genomes G 1 and G 2 were sequenced. G 1 is an ancestor of G 2 . G 1 is represented by the unsigned permutation 1,2,…,n.

hasad
Download Presentation

Comp. Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comp. Genomics Recitation 13 Genome rearrangements Homework solutions

  2. Exercise 1 • Two haploid, single-chromosome genomes G1 and G2 were sequenced. G1 is an ancestor of G2. G1 is represented by the unsigned permutation 1,2,…,n. • The region gi,…,gj is known as a “tough chromosomal region”. Reversal events never create breakpoints in this region.

  3. Exercise 1 • Assume that G2 was generated from G1 by the minimal number of reversal events that is needed for obtaining G2 • Give an upper bound on the number of reversal events that occurred during G1 to G2 evolution.

  4. Solution 1 • We can apply the same reversals in reverse order to obtain G1 • E.g., if a single reversal transformed G1=12345 into G2=14325, we can apply a reversal on the same indices and get G1 • So is we show a series of reverse-reversals of length k, k is an upper bound

  5. Solution 1 • Genes 1,…,i-1 appear in G2 before position i or after position j. In the worst case, we need i-1 reversal operations to get these genes into their correct order. • Then we have in G2: • 1,2,..,i-1,TOUGH_REGION,REST_OF_GENES • where the TOUGH_REGION is either i,i+1,…,j or j,j-1,…,i+1

  6. Solution 1 • We can fix the REST_OF_GENES region in n-j-1 reversal operations, and in total we get i-1+1+n-j-1=n-(j-i)-1

  7. Exercise 2 • A break point is a location in the sequence such that • Prove or refute: Out of n/2 reversals on the unsigned permutation 1,2,…,n, there is at least one reversal that cancels a breakpoint at some index. • A reversal operates on a subsequence. • Note that a reversal can both cancel a breakpoint and create new ones

  8. Solution 2 • Can you refute it? • The claim is false. • Consider the permutation (1,2,3). • (1,2,3)(1,3,2)(3,1,2)(1,3,2)… No No Yes Yes Yes No No No

  9. Exercise 3 • Two reversals occur on the permutation 1,2,…,n. How many breakpoints can occur in the resulting permutation?

  10. Solution 3 • One reversal: 1 2 3 4 5 6 7 1 76 5 4 3 2 one breakpoint 1 6 5 4 3 2 7two breakpoints

  11. Solution 3 • Two reversals: 1 2 3 4 5 6 7 1 6 5 4 3 2 7 1 2 3 4 5 6 7 zero breakpoints

  12. Solution 3 • Two reversals: 1 2 3 4 5 6 7 1 7 6 5 4 3 2 3 4 5 6 7 1 2 one breakpoint

  13. Solution 3 • Two reversals: 1 2 3 4 5 6 7 1 76 5 4 3 2 1 3 4 5 6 7 2 two breakpoints

  14. Solution 3 • Two reversals: 1 2 3 4 5 6 7 1 6 5 4 3 2 7 1 6 2 3 4 57 three breakpoints

  15. Solution 3 • Four breakpoints: 1 2 3 4 5 6 7 1 6 5 4 3 2 7 1 6 5 3 4 2 7 four breakpoints

  16. DCJ Algorithm • Why does it run in linear time?

  17. DCJ Algorithm – cont’d • dDCJ(A,B) = N – (C+I/2). • Each iteration increments either C by on or I by two. • Our genome representation allows to find and perform each sorting operation in constant time. • The DCJ distance is never larger than N.

  18. שאלה ממועד א' תשס"ז • גנום הוא קבוצה של כרומוזומים, שבו כל כרומוזום הוא רצף של מספרים שלמים בעלי סימן. יחד, הכרומוזומים מכילים את המספרים השלמים 1,…,n ללא חזרות. • למשל, G={(1,-2,3),(4,5,6,-7)} הוא גנום עם שני כרומוזומים אנחנו מניחים שכרומוזום וההפכי שלו עם סימנים הפוכים הם שקולים. • לכן (4,5,6,-7) שקול ל-(7,-6,-5,-4).

  19. שאלה ממועד א' תשס"ז • פעולת היפוך (reversal) הופכת את הסדר ואת הסימנים של מקטע רציף בתוך כרומוזום בודד. לכן, היפוך יחיד על הכרומוזום הראשון של G יכול לייצר את הגנום {(1,-3,2), (4,5,6,-7)} • פעולת העברה (translocation) מחליפה מקטעים קיצוניים של שני כרומוזומים (כאשר אחד מהם יכול להיות ריק). למשל, העברה על G יכולה ליצור את הגנום {(1,-2,5,6,-7),(4,3)}.

  20. שאלה ממועד א' תשס"ז • הבעיה היא לעבור מגנום נתון לגנום אחר תוך שימוש במספר מינימלי של פעולות היפוך והעברה. • תן אלג' המבטיח יחס ביצועים קבוע לבעיה ופועל בזמן פולינמויאלי.

  21. פתרון • הבעיה שקולה ל-signed reversal. • ראינו בכיתה פתרון 2-קירוב בזמן פולימניאלי.

  22. HW 3 question 5 • Uniform lifted alignment – alignment in which for each level all string are either lifted from right or left. • Prove that the optimal uniform lifted alignment has cost at most twice of the optimal alignment tree. • Give a polynomial algorithm to find the optimal uniform lifted alignment.

  23. HW 3 question 5 • Uniform lifted alignment, proof: • Assume we had the optimal tree T*. • Transform it in the following way: • To assign string at level k, consider: • Pick the minimal sum.

  24. HW 3 – question 5 – cont’d • Assign each ‘costy’ edge (T,S) to a path in the optimal tree: • The path from leaf (T) to node (S*). S (S*) T S T Together, these paths cover all edges of the tree.

  25. HW 3 – question 5 – cont’d By triangle inequality: D(S, T) ≤ D(S, S*) + D(S*, T) S (S*) T S By choice of left/right: Σs D(S,S*)+D(S*,T) ≤ Σs D(S*,T)+D(S*,T) = Σs 2D(S*,T) T => One-sided tree with cost at most twice the optimal.

  26. HW 3 – question 5 – cont’d • Algorithm: • Preprocess pairwise sequence distances. • Try all different assignments for a left/right for each level, and pick the minimal one. • Running time (n sequences of length m): • Proprocessing: O(m2n2). • Height h, different assignment 2h. • Calculation cost of tree O(n).

  27. HW 1 question 1 • Question 1: Explain how to compute local alignment in linear space • The linear space algorithm from the lecture is a global alignment algorithm

  28. solution x y local alignment global alignment

  29. solution • For every cell [i,j] in the DP matrix, add a field b[i,j] that will be updated as follows: • If the score of [i,j] is 0 then b[i,j]=(i,j) • Otherwise • If match b[i,j]=b[i-1,j-1] • If mismatch for x b[i,j]=b[i-1,j] • If mismatch for y b[i,j]=b[i,j-1]

  30. solution • Use the linear space algorithm from class for computing the score of the optimal local alignment • At the same time the field b[i,j] can be updated for every cell • Now, “cut out” the small matrix using the cell with the optimal score [i* ,j*] and b[i* ,j*], and run Hirschberg

More Related