1 / 16

Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters

Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters. Advisor: Professor R. C. T. Lee Speaker: B. W. Xiao 2004/06/04. Multiple Sequence Alignment. Input: k sequences on alphabet { a, g, c, t } Output: An alignment A aligns these sequences (allowing gap)

tasha-adams
Download Presentation

Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters Advisor: Professor R. C. T. Lee Speaker: B. W. Xiao 2004/06/04 CSIE NCNU

  2. Multiple Sequence Alignment • Input: k sequences on alphabet {a, g, c, t} • Output: An alignment Aaligns these sequences (allowing gap) attgcc, ttacgg, aatgga, tatcgt, cgatag CSIE NCNU

  3. Progressive Methods • Multiple Sequence Alignment is NP-hard. (Wang and Jiang 1994, sum of pair) • 2-Approximation by Gulsfield (1991) • Input: k sequences • Output: An alignment of k sequences with performance ratio smaller than 2 • Idea: Do several times of pairwise alignment to combine a multiple sequence alignment. CSIE NCNU

  4. Remarks • In progressive methods, we always consider sequences, and we always use adding gaps to achieve multiple sequence alignment. • In Gulsfield’s 2-approximation, it doesn’t handle sequences containing clusters well. • Can we align more than 2 sequences at once with a short period of time? CSIE NCNU

  5. Data Structure of Block Alignment • We use a matrix to present a sequence or an alignment. • Given . • We can use to present the alignment. CSIE NCNU

  6. Aligning Matrices • From now on, what we consider is a set of matrices which represent sequences or alignments. • We use the idea the same with pairwise alignment to align two matrices. • We define that and are two matrices which present sequences or alignments to be aligned. CSIE NCNU

  7. Scoring Columns • In pairwise alignment, what we align is two characters. And in block alignment, what we align will be column vectors. • Let there be two column vectors P and Q, where and . CSIE NCNU

  8. Aligning Columns CSIE NCNU

  9. Recurrence Formula CSIE NCNU

  10. The Algorithm Based of Block Alignment Input: k sequences Output: an alignment • Step1: Initialize every sequence as a block. • Step 2: Merge the two nearest blocks. • Step 3: Repeat Step 2 until there is only one block. CSIE NCNU

  11. Given S1=atttaagggc, S2=aattaagggc, S3=atttacgggc, S4=cccttaacg, S5=cccataacg • The following is the corresponding graph. 9 2 2 2 CSIE NCNU

  12. Experimental Results • We generate ten sets of data, and each set has ten sequences which have two clusters and their lengths are all about 500. CSIE NCNU

  13. Experimental Results • We generate four sets of data, and each set has nine sequences which has three clusters. CSIE NCNU

  14. Experimental Results • We took ten DNA sequences of 5 hepatitis B viruses and 5 hepatitis C viruses to test with block alignment and 2-approximation. We also took seven sequences of 3 dogs and 4 wolves to test. CSIE NCNU

  15. Discussions and Future Works • We may use other score function to evaluate. • We also can try other strategy to merge blocks. • We can expand our program to align protein sequence, and then applying PAM matrix to replace our score function. CSIE NCNU

  16. Thank you CSIE NCNU

More Related