1 / 10

Resequencing using C-Linda and Largest Common Subsequence

Resequencing using C-Linda and Largest Common Subsequence. Nayeong Jeong & Kenn Jacoby May 14, 2000 Parallel Computing II, Prof. Paul Tymann. Parts of the project:. C procedure to position two segments and move them past each other and do exact matching overlap check. (Stage1Engine)

cormac
Download Presentation

Resequencing using C-Linda and Largest Common Subsequence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resequencing using C-Linda and Largest Common Subsequence Nayeong Jeong & Kenn Jacoby May 14, 2000 Parallel Computing II, Prof. Paul Tymann

  2. Parts of the project: • C procedure to position two segments and move them past each other and do exact matching overlap check. (Stage1Engine) • C-Linda program to coordinate exhaustive search of pool of input data. • Largest Common Sequence Algorithm • C procedure to combine LCS algorithm into previous Stage1Engine which gives Stage2Engine.

  3. Stage1Engine substr1 smallseg 1 largeseg substr2 overlap 2 3 4

  4. C-Linda Algorithm W1 W2 W3 Input file 1 cgattgatgcgcgtgatg 2 agcgtgcgtagagtcgtg 3 aggctctctcgtgtatctcgtgtt 4 gatctctagctcgctagttgtgc 5 cgatattttcgttgatccgctagt . . . . . . . . . 36 tagcatagctcgatcg 1/2 1/3 1/4 1/5 1/6 1/7 . . 1/36 2/3 2/4 2/5 2/6 2/7 . . 2/36 3/4 3/5 3/6 3/7 . . 3/36 Gen 0 1 & 5 matched 2 & 7 matched 3 & 36 matched Mark segments’ tuples that were absorbed by a successful hit as deleted, and condense to a new input array of strings. Check if the number of segments has decreased, and if not, you are done. W1 W2 W3 2/3 2/4 2/6 2/8 . 2/35 1/2 1/3 1/4 1/6 1/8 . 1/35 3/4 3/6 3/8 . . 3/35 Gen 1

  5. LCS Algorithm ccatcctgctgaacgatc Lcs_length = 14 Thresh = 16 atcgtgctgatcgatcgg catcctgctgaacgatcg Lcs_length = 15 Thresh = 16 atcgtgctgatcgatcgg atcctgctgaacgaacgg Lcs_length = 16 Thresh = 16 atcgtgctgatcgatcgg

  6. General Flow worker 0 Tuplespace agtcgatcgcataacg cagactcgcatccagca gccatactacgcaatcacacag cgacactagctcacgactacaa . . . . char * Stage2Engine( ) int lcs_length( ) Combined string worker 1 char * Stage2Engine( )

  7. Testing with input... cgatacgcactacgca gcactacgcatttact cgatacgcattacgca gcactatgcatttact O!--pardon-me-thou-bleeding-piece thou-bleading-piece-of-earth piece-of-earth-that-I-am-meak-and am-meek-and-gentle-with-these-butchers. … Shakespeare’s Julius Caesar

  8. Limitations, Hurdles • Did not use memory allocation due to an unexplainable hang, so there is a limit to the size of the input we can process. Somewhere around 18 sub-sequences which total ~1400 characters when fully combined. • Had to hard code the value of threshold to equal (overlay - 2) because we did not use recursion as we should have in the Stage2Engine( ) procedure. So, 2 characters was our maximum number of substitutions within comparison window that we would allow. • Program does not put a limit to number of substitutions that you can have grouped side by side.

  9. References Introduction to Algorithms, Cormen, Leiserson and Rivest p. 314-320 Largest Common Subsequence Dr. Gary Skuse, RIT Bioinformatics, Biology Dept.

More Related