This material in not in your text (except as exercises). Sequence Comparisons Problems in molecular biology involve finding the minimum number of edit steps which are required to change one string into another. Three types of edit steps: insert, delete, replace. Example: abbc babb
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
C(n,m) is the cost of changing the first n of str1 to the first m of str2.
2. If we insert a new value at the end of A(n) to match bm, we would still have to change A(n) to B(m-1). The cost is C(n,m) = C(n,m-1) + 1
3. If we replace an with bm, we still have to change A(n-1) to B(m-1). The cost is C(n,m) = C(n-1,m-1) + 1
4. If we match an with bm, we still have to change A(n-1) to B(m-1). The cost is C(n,m) = C(n-1,m-1)
Find the longest increasing subsequence in a sequence of distinct integers.
Idea 1. Given a sequence of size less than m, can find the longest sequence of it. (Induction)
The problem is that we don't know how to increase the length.
Case 1: It either can be added to the longest subsequence or not
Case 2: It is possible that it can be added to a non-selected subsequence (creating a sequence of equal length - but having a smaller ending point)
Case 3: It can be added to a non-selected sub-sequence creating a sequence of smaller length but successors make it a good choice.
Example: 5 1 10 2 20 30 40 4 5 6 7 8 9 10 11
For s= 1 to n (or recursively the other way)
For k = s downto 1 until find correct spot If BIS(k) > As and BIS(k-1) < As
BIS(k) = As
Actually, we don't need the sequential search as can do a binary search.
5 1 10 2 12 8 15 18 45 6 7 3 8 9
There are four possibilities:
We will be right 75% of the time! We only lose if both are in the lowest half.
Another type of probabilistic algorithm is one that never gives a wrong result, but its running time is not guaranteed.
1. sort the list O(n log n)
2. If have a balanced tree of candidate names would be n log c (where c is number of candidates) Note, if we don’t know how many candidates, we can’t give them indices.
3. See if median (kth largest algorithm) occurs more than n/2 times. O(n)
4. Take a small sample. Find the majority - then count how many times it occurs in the whole list. This is a probabilistic approach.
5. Make one pass - Discard elements that won’t affect majority.
List: 1 4 6 3 4 4 4 2 9 0 2 4 1 4 2 2 3 2 4 2
Occurs: X X 1 X 1 2 3 2 1 X 1 X 1 X 1 2 1 2 1 2
Candidate: 1 6 4 4 4 4 4 ? 2 ? 1 ? 2 2 2 2 2 2
2 is a candidate, but is not a majority in the whole list.
So why do this over other ways? Simple to code. No different in terms of complexity, but interesting to think about.