Comp. Genomics

Comp. Genomics Recitation 13 Genome rearrangements Homework solutions

Exercise 1 • Two haploid, single-chromosome genomes G1 and G2 were sequenced. G1 is an ancestor of G2. G1 is represented by the unsigned permutation 1,2,…,n. • The region gi,…,gj is known as a “tough chromosomal region”. Reversal events never create breakpoints in this region.

Exercise 1 • Assume that G2 was generated from G1 by the minimal number of reversal events that is needed for obtaining G2 • Give an upper bound on the number of reversal events that occurred during G1 to G2 evolution.

Solution 1 • We can apply the same reversals in reverse order to obtain G1 • E.g., if a single reversal transformed G1=12345 into G2=14325, we can apply a reversal on the same indices and get G1 • So is we show a series of reverse-reversals of length k, k is an upper bound

Solution 1 • Genes 1,…,i-1 appear in G2 before position i or after position j. In the worst case, we need i-1 reversal operations to get these genes into their correct order. • Then we have in G2: • 1,2,..,i-1,TOUGH_REGION,REST_OF_GENES • where the TOUGH_REGION is either i,i+1,…,j or j,j-1,…,i+1

Solution 1 • We can fix the REST_OF_GENES region in n-j-1 reversal operations, and in total we get i-1+1+n-j-1=n-(j-i)-1

Exercise 2 • A break point is a location in the sequence such that • Prove or refute: Out of n/2 reversals on the unsigned permutation 1,2,…,n, there is at least one reversal that cancels a breakpoint at some index. • A reversal operates on a subsequence. • Note that a reversal can both cancel a breakpoint and create new ones

Solution 2 • Can you refute it? • The claim is false. • Consider the permutation (1,2,3). • (1,2,3)(1,3,2)(3,1,2)(1,3,2)… No No Yes Yes Yes No No No

Exercise 3 • Two reversals occur on the permutation 1,2,…,n. How many breakpoints can occur in the resulting permutation?

Solution 3 • One reversal: 1 2 3 4 5 6 7 1 76 5 4 3 2 one breakpoint 1 6 5 4 3 2 7two breakpoints

Solution 3 • Two reversals: 1 2 3 4 5 6 7 1 6 5 4 3 2 7 1 2 3 4 5 6 7 zero breakpoints

Solution 3 • Two reversals: 1 2 3 4 5 6 7 1 7 6 5 4 3 2 3 4 5 6 7 1 2 one breakpoint

Solution 3 • Two reversals: 1 2 3 4 5 6 7 1 76 5 4 3 2 1 3 4 5 6 7 2 two breakpoints

Solution 3 • Two reversals: 1 2 3 4 5 6 7 1 6 5 4 3 2 7 1 6 2 3 4 57 three breakpoints

Solution 3 • Four breakpoints: 1 2 3 4 5 6 7 1 6 5 4 3 2 7 1 6 5 3 4 2 7 four breakpoints

DCJ Algorithm • Why does it run in linear time?

DCJ Algorithm – cont’d • dDCJ(A,B) = N – (C+I/2). • Each iteration increments either C by on or I by two. • Our genome representation allows to find and perform each sorting operation in constant time. • The DCJ distance is never larger than N.

שאלה ממועד א' תשס"ז • גנום הוא קבוצה של כרומוזומים, שבו כל כרומוזום הוא רצף של מספרים שלמים בעלי סימן. יחד, הכרומוזומים מכילים את המספרים השלמים 1,…,n ללא חזרות. • למשל, G={(1,-2,3),(4,5,6,-7)} הוא גנום עם שני כרומוזומים אנחנו מניחים שכרומוזום וההפכי שלו עם סימנים הפוכים הם שקולים. • לכן (4,5,6,-7) שקול ל-(7,-6,-5,-4).

שאלה ממועד א' תשס"ז • פעולת היפוך (reversal) הופכת את הסדר ואת הסימנים של מקטע רציף בתוך כרומוזום בודד. לכן, היפוך יחיד על הכרומוזום הראשון של G יכול לייצר את הגנום {(1,-3,2), (4,5,6,-7)} • פעולת העברה (translocation) מחליפה מקטעים קיצוניים של שני כרומוזומים (כאשר אחד מהם יכול להיות ריק). למשל, העברה על G יכולה ליצור את הגנום {(1,-2,5,6,-7),(4,3)}.

שאלה ממועד א' תשס"ז • הבעיה היא לעבור מגנום נתון לגנום אחר תוך שימוש במספר מינימלי של פעולות היפוך והעברה. • תן אלג' המבטיח יחס ביצועים קבוע לבעיה ופועל בזמן פולינמויאלי.

פתרון • הבעיה שקולה ל-signed reversal. • ראינו בכיתה פתרון 2-קירוב בזמן פולימניאלי.

HW 3 question 5 • Uniform lifted alignment – alignment in which for each level all string are either lifted from right or left. • Prove that the optimal uniform lifted alignment has cost at most twice of the optimal alignment tree. • Give a polynomial algorithm to find the optimal uniform lifted alignment.

HW 3 question 5 • Uniform lifted alignment, proof: • Assume we had the optimal tree T*. • Transform it in the following way: • To assign string at level k, consider: • Pick the minimal sum.

HW 3 – question 5 – cont’d • Assign each ‘costy’ edge (T,S) to a path in the optimal tree: • The path from leaf (T) to node (S*). S (S*) T S T Together, these paths cover all edges of the tree.

HW 3 – question 5 – cont’d By triangle inequality: D(S, T) ≤ D(S, S*) + D(S*, T) S (S*) T S By choice of left/right: Σs D(S,S*)+D(S*,T) ≤ Σs D(S*,T)+D(S*,T) = Σs 2D(S*,T) T => One-sided tree with cost at most twice the optimal.

HW 3 – question 5 – cont’d • Algorithm: • Preprocess pairwise sequence distances. • Try all different assignments for a left/right for each level, and pick the minimal one. • Running time (n sequences of length m): • Proprocessing: O(m2n2). • Height h, different assignment 2h. • Calculation cost of tree O(n).

HW 1 question 1 • Question 1: Explain how to compute local alignment in linear space • The linear space algorithm from the lecture is a global alignment algorithm

solution x y local alignment global alignment

solution • For every cell [i,j] in the DP matrix, add a field b[i,j] that will be updated as follows: • If the score of [i,j] is 0 then b[i,j]=(i,j) • Otherwise • If match b[i,j]=b[i-1,j-1] • If mismatch for x b[i,j]=b[i-1,j] • If mismatch for y b[i,j]=b[i,j-1]

solution • Use the linear space algorithm from class for computing the score of the optimal local alignment • At the same time the field b[i,j] can be updated for every cell • Now, “cut out” the small matrix using the cell with the optimal score [i* ,j*] and b[i* ,j*], and run Hirschberg

Comp. Genomics

Comp. Genomics

Presentation Transcript

DNA Chips and Their Analysis Comp. Genomics: Lecture 13

Genomics

Genomics

Computational Genomics Fall 2004/5 www.cs.tau.ac.il/~bchor/CG05/comp-genom.html

Comp. Genomics

Comp. Genomics

Genomics

Comp. Genomics

Intro to Comp Genomics

Comp. Genomics

Comp. Genomics

Comp. Genomics

Comp. Genomics

Genomics

Comp. Genomics

Genomics

Comp. Genomics

Computational Genomics Spring 2009 cs.tau.ac.il/~bchor/CG09/comp-genom.html

Genomics

Comp. Genomics

Genomics

Comp. Genomics