Outline

1 / 30

# Outline - PowerPoint PPT Presentation

Outline. Introduction Motivation Algorithm Experiments Conclusions . I ntroduction . Multiple sequence alignment (MSA) NP-hard problem The heuristic methods for MSA Progressive method ClustalW, T-Coffee, POA, and etc. Iterative method Muscle, DIALIGN, and etc. Probabilistic method

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Outline' - sammy

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Outline
• Introduction
• Motivation
• Algorithm
• Experiments
• Conclusions

SSLAB, Deportment of computer science, National Tsing Hua University

Introduction
• Multiple sequence alignment (MSA)
• NP-hard problem
• The heuristic methods for MSA
• Progressive method
• ClustalW, T-Coffee, POA, and etc.
• Iterative method
• Muscle, DIALIGN, and etc.
• Probabilistic method
• Probcons, Hmmt, Muscle, and etc.
• Anchor-based method
• MAFFT, Align-m , and etc.

SSLAB, Deportment of computer science, National Tsing Hua University

Introduction (cont’)
• Pairwise alignment
• Use Dynamic programming to find the optimal alignment. [Needleman, J. Mol. Biol 1970; Smith, J. Mol. Biol 1981]
• Three-sequence alignment
• More accurate than pairwise alignment. [Murata, PNAS 1985]
• Introduce linear gap penalty. [Gotoh, J. Theor. Biol 1986]
• Space has been reduced from O(N3) to O(N2) with affine gap penalty. [Huang, ACM 1994]
• Useful for MSA. [Makoto, Bioinformatics 1993; CY Lin, CMCT 2006, ICPP 2007]

SSLAB, Deportment of computer science, National Tsing Hua University

Introduction (cont’)
• Progressive multiple sequence alignment (Progressive pairwise MSA)
• To align pair sequences following the branching order of the guide tree until all sequences are aligned.
• The resulting alignment is affected by Initial branching order.
• Problems of Gap
• Gap will not be removed.
• Insertion gap may be calculated multiple times. [Loytynoja, PNAS2005]

SSLAB, Deportment of computer science, National Tsing Hua University

Introduction (cont’)
• Progressive triple MSA - aln3nn
• Published on [Matthias, BMC Bioinformatics July, 2007].
• Any alignment step is three-sequence alignment.
• The three-sequence alignment uses the affine gap penalty same as [Huang, ACM 1994].
• Use Huang’s three-sequence alignment algorithm.

SSLAB, Deportment of computer science, National Tsing Hua University

Motivation
• CrossWA - combine three-sequence and pairwise alignments
• Minimize the problem of Progressive pairwise MSA
• Use three-sequence alignment to reduce the affection of initial branching order.
• Increase the accuracy of alignment
• Three-sequence alignment may obtain more accurate alignments.
• Keep pairwise alignment because three-sequence alignment is not always better than pairwise alignment.
• For pairwise, using position-specific gap penalty is more accurate than affine gap penalty. [Thompson, Bioinformatics 1995]
• Introduce position-specific gap penalty into three-sequence alignment which is different to the algorithm “aln3nn”.
• Avoid increasing the computing time

SSLAB, Deportment of computer science, National Tsing Hua University

Motivation (cont’)
• Comparison of three protein sequences among different methods

SSLAB, Deportment of computer science, National Tsing Hua University

Motivation (cont’)
• Three-sequence alignment VS Progressive pairwise MSA – with three sequences (430 test sets, random selected from BAliBase 2.0 Ref1 -5)
• Three-sequence alignment with position-specific gap penalty and sequence weighting

SSLAB, Deportment of computer science, National Tsing Hua University

Motivation (cont’)
• Progressive pairwise MAS (ClustalW) VS Progressive Triple MSA (aln3nn) – reference set 1, BAliBase 2.0 [Matthias, BMC Bioinformatics 2007, 7]

SSLAB, Deportment of computer science, National Tsing Hua University

General Process of Progressive Multiple sequence alignment

． ．

． ． ．

Step 2. Constructing guide tree

Unaligned sequences

Step 1. Calculating distance matrix

Aligning pair sequence or group along the branching order

． ．

Aligned sequences

Step 3. Alignment

SSLAB, Deportment of computer science, National Tsing Hua University

Algorithm
• Process of CrossWA
• Step 1. construct distance matrix.
• Step 2. build guide tree – Neighbour-Joining.
• Sequence weights will be calculated.
• Step 3. build a new guide tree modified from the guide tree.
• Branches will be changed for three-sequence and pairwise alignments.
• Sequence weights will be recalculated.
• Step 4. Alignment.
• Pairwise alignment
• Three-sequence alignment
• Compare with the alignment produced by progressive pairwise alignment with same three sequences and select better one.

SSLAB, Deportment of computer science, National Tsing Hua University

Algorithm (cont’)

． ．

． ． ．

Unaligned sequences

Step 1. Calculating distance matrix

Step 2. Constructing guide tree

Aligning pair or three sequences (or groups) along the branching order of new tree

． ．

． ． ．

Aligned sequences

VS

Step 3. Constructing new tree modified from the guide tree in step 2

Progressive Pairwise

MSA

Three-sequence alignment

Step 4. Alignment

SSLAB, Deportment of computer science, National Tsing Hua University

Algorithm (cont’)
• The branch changing rule

Type I

Type II

Type III

SSLAB, Deportment of computer science, National Tsing Hua University

Algorithm (cont’)
• The evaluation of three-sequence alignment
• If SP(S’’) > SP(T’) then keep S’’
• IF SP(T’) > SP(S’’) then keep T’

A

B

C

A

B

C

S’ = Align(B, C)

S’’ = Align(A, S’)

T’ = Align(A, B, C)

SSLAB, Deportment of computer science, National Tsing Hua University

Algorithm (cont’)
• Modification of sequence weights
• The calculation of sequence weight is same as ClustalW.

D

D

B

A

C

A

C

Weight of Hba_Human

= 0.055 + 0.219/2 + 0.061/4 + 0.015/5 + 0.062/6 = 0.194

Length between node A and node C

= 0.219 + 0.061 = 0.280

Weight of Hba_Human

= 0.055 + 0.280/2 + 0.077/5

= 0.210

• The strategy of Gap penalty
• Introduce position-specific gap penalty into three-sequence alignment (modified from ClustalW).

SSLAB, Deportment of computer science, National Tsing Hua University

Experiments
• System environment
• Linux (AMD opteron 250 2.4G with 512MB of memory)
• Data source
• BAliBASE 2.0
• Reference sets (1 – 5). [T-Coffee, Muscle, Probcons, aln3nn, and etc]
• Reference sets (6 – 8) contain repeats, inversions and transmembrane helices, for which none of the tested algorithms is designed. [Muscle]

SSLAB, Deportment of computer science, National Tsing Hua University

Experiments (cont’)
• Scoring functions
• Sum-of-pair (SP)
• Total Column Score (TC)
• Proportion probability (%)
• No. of best alignment of the method/No. of total test sets
• Comparing algorithms
• CrossWAfast, CrossWAfull, ClustalW 1.83, T-Coffee 5.05, Muscle 3.6.
• CrossWAfast : only use the type I in the branch changing rule.
• CrossWAfull : use all types in the branch changing rule.

SSLAB, Deportment of computer science, National Tsing Hua University

Experiments (cont’)
• The comparison of SP scores among different alignment methods

SSLAB, Deportment of computer science, National Tsing Hua University

Experiment (cont’)
• The comparison of TC scores among different alignment methods

SSLAB, Deportment of computer science, National Tsing Hua University

Experiments (cont’)
• The SP scores for each method of variant average identities in Reference 1 data set

SSLAB, Deportment of computer science, National Tsing Hua University

Experiments (cont’)
• The TC scores for each method of variant average identities in Reference 1 data set

SSLAB, Deportment of computer science, National Tsing Hua University

Experiments (cont’)
• The performance of CrossWA with 20 sequences

SSLAB, Deportment of computer science, National Tsing Hua University

Experiments (cont’)
• The Performance of CrossWA with 40 sequences

SSLAB, Deportment of computer science, National Tsing Hua University

Experiments (cont’)
• Comparison of performance among different methods with 20 sequences

SSLAB, Deportment of computer science, National Tsing Hua University

Experiments (cont’)
• Comparison of performance among different methods with 40 sequences

SSLAB, Deportment of computer science, National Tsing Hua University

Conclusions
• Three-sequence alignment can obtain better resulting alignment than pairwise alignment, but not for all data sets.
• Combining three-sequence alignment and pairwise alignment can keep better alignment at any alignment step in progressive MSA.
• From the experimental results, CrossWA can be another useful tool to align multiple sequence.
• CrossWA can be used to align DNA sequences.
• For aligning Genome data, computing time is a problem. It can be solved by parallel programming. [CY Lin, ICPP 2007]

SSLAB, Deportment of computer science, National Tsing Hua University

Web service

Http://140.114.91.10/Genome

SSLAB, Deportment of computer science, National Tsing Hua University

Reference
• Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48:443-453.27. [Needleman, J Mol Biol 1970]
• Smith TF, Waterman MS : Identification of common molecular subsequences. J. Mol. Biol. 1981, 147:195-197. [Smith, J Mol Biol 1981]
• Murata M, Richardson JS, Sussman JL: Simultaneous comparison of three protein sequences. Proc Natl Acad Sci U S A. 1985, 82:3073-3077. [Murata, PNAS 1985]
• Gotoh O: Alignment of three biological sequences with an efficient traceback procedure, J Theor Biol 1986, 327-337. [Gotoh, J Theor Biol 1986]
• Huang X: Alignment of three sequences in quadratic space. Applied Computing Review 1993, 1:7-11. [Huang, ACM 1993]
• Makoto H, Maski H, Masato I, Tomoyuki T: MASCOT: multiple alignment system for protein sequences based on three-way dynamic programming, J Mol Biol 1993, 2:161-167. [Makoto, Bioinformatics 1993]

SSLAB, Deportment of computer science, National Tsing Hua University

Reference (cont’)
• CY Lin, CT Huang, YC Chung, Chuan YT: Parallel Three-sequence Alignment with Space-efficient,Proceedings of the 23th Workshop on Combinatorial Mathematics and Computation Theory, Chang-Hua, Taiwan, April 2006, 160-165. [CY Lin, CMCT 2006]
• CY Lin, CT Huang, YC Chung, Chuan YT: Efficient Parallel Algorithm for Optimal Three-Sequences Alignment. International Conference on Parallel Processing 2007. [CY Lin, ICPP 2007]
• Loytynoja A, Goldman N: An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A. 2005,102(30):10557-10562. [Loytynoja, PNAS 2005]
• Matthias K, Peter FS: Progressive multiple sequence alignments from triplets. BMC Bioinformatics 2007. [matthias, BMC Bioinformatics July, 2007]
• Thompson JD: Introducing variable gap penalties to sequence alignment in linear space. Bioinformatics 1995, 11:181-186. [Thompson, Bioinformatics 1995]

SSLAB, Deportment of computer science, National Tsing Hua University