outline n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Outline PowerPoint Presentation
Download Presentation
Outline

Loading in 2 Seconds...

play fullscreen
1 / 30

Outline - PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on

Outline. Introduction Motivation Algorithm Experiments Conclusions . I ntroduction . Multiple sequence alignment (MSA) NP-hard problem The heuristic methods for MSA Progressive method ClustalW, T-Coffee, POA, and etc. Iterative method Muscle, DIALIGN, and etc. Probabilistic method

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Outline' - sammy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Introduction
  • Motivation
  • Algorithm
  • Experiments
  • Conclusions

SSLAB, Deportment of computer science, National Tsing Hua University

i ntroduction
Introduction
  • Multiple sequence alignment (MSA)
    • NP-hard problem
  • The heuristic methods for MSA
    • Progressive method
      • ClustalW, T-Coffee, POA, and etc.
    • Iterative method
      • Muscle, DIALIGN, and etc.
    • Probabilistic method
      • Probcons, Hmmt, Muscle, and etc.
    • Anchor-based method
      • MAFFT, Align-m , and etc.

SSLAB, Deportment of computer science, National Tsing Hua University

introduction cont
Introduction (cont’)
  • Pairwise alignment
    • Use Dynamic programming to find the optimal alignment. [Needleman, J. Mol. Biol 1970; Smith, J. Mol. Biol 1981]
  • Three-sequence alignment
    • More accurate than pairwise alignment. [Murata, PNAS 1985]
    • Introduce linear gap penalty. [Gotoh, J. Theor. Biol 1986]
    • Space has been reduced from O(N3) to O(N2) with affine gap penalty. [Huang, ACM 1994]
    • Useful for MSA. [Makoto, Bioinformatics 1993; CY Lin, CMCT 2006, ICPP 2007]

SSLAB, Deportment of computer science, National Tsing Hua University

introduction cont1
Introduction (cont’)
  • Progressive multiple sequence alignment (Progressive pairwise MSA)
    • To align pair sequences following the branching order of the guide tree until all sequences are aligned.
    • The resulting alignment is affected by Initial branching order.
    • Problems of Gap
      • Gap will not be removed.
      • Insertion gap may be calculated multiple times. [Loytynoja, PNAS2005]

SSLAB, Deportment of computer science, National Tsing Hua University

introduction cont2
Introduction (cont’)
  • Progressive triple MSA - aln3nn
    • Published on [Matthias, BMC Bioinformatics July, 2007].
    • Any alignment step is three-sequence alignment.
    • The three-sequence alignment uses the affine gap penalty same as [Huang, ACM 1994].
    • Use Huang’s three-sequence alignment algorithm.

SSLAB, Deportment of computer science, National Tsing Hua University

motivation
Motivation
  • CrossWA - combine three-sequence and pairwise alignments
    • Minimize the problem of Progressive pairwise MSA
      • Use three-sequence alignment to reduce the affection of initial branching order.
    • Increase the accuracy of alignment
      • Three-sequence alignment may obtain more accurate alignments.
      • Keep pairwise alignment because three-sequence alignment is not always better than pairwise alignment.
      • For pairwise, using position-specific gap penalty is more accurate than affine gap penalty. [Thompson, Bioinformatics 1995]
      • Introduce position-specific gap penalty into three-sequence alignment which is different to the algorithm “aln3nn”.
    • Avoid increasing the computing time

SSLAB, Deportment of computer science, National Tsing Hua University

motivation cont
Motivation (cont’)
  • Comparison of three protein sequences among different methods

SSLAB, Deportment of computer science, National Tsing Hua University

motivation cont1
Motivation (cont’)
  • Three-sequence alignment VS Progressive pairwise MSA – with three sequences (430 test sets, random selected from BAliBase 2.0 Ref1 -5)
    • Three-sequence alignment with position-specific gap penalty and sequence weighting

SSLAB, Deportment of computer science, National Tsing Hua University

motivation cont2
Motivation (cont’)
  • Progressive pairwise MAS (ClustalW) VS Progressive Triple MSA (aln3nn) – reference set 1, BAliBase 2.0 [Matthias, BMC Bioinformatics 2007, 7]

SSLAB, Deportment of computer science, National Tsing Hua University

general process of progressive multiple sequence alignment
General Process of Progressive Multiple sequence alignment

. .

. . .

Step 2. Constructing guide tree

Unaligned sequences

Step 1. Calculating distance matrix

Aligning pair sequence or group along the branching order

. .

Aligned sequences

Step 3. Alignment

SSLAB, Deportment of computer science, National Tsing Hua University

algorithm
Algorithm
  • Process of CrossWA
    • Step 1. construct distance matrix.
    • Step 2. build guide tree – Neighbour-Joining.
      • Sequence weights will be calculated.
    • Step 3. build a new guide tree modified from the guide tree.
      • Branches will be changed for three-sequence and pairwise alignments.
      • Sequence weights will be recalculated.
    • Step 4. Alignment.
      • Pairwise alignment
      • Three-sequence alignment
        • Compare with the alignment produced by progressive pairwise alignment with same three sequences and select better one.

SSLAB, Deportment of computer science, National Tsing Hua University

algorithm cont
Algorithm (cont’)

. .

. . .

Unaligned sequences

Step 1. Calculating distance matrix

Step 2. Constructing guide tree

Aligning pair or three sequences (or groups) along the branching order of new tree

. .

. . .

Aligned sequences

VS

Step 3. Constructing new tree modified from the guide tree in step 2

Progressive Pairwise

MSA

Three-sequence alignment

Step 4. Alignment

SSLAB, Deportment of computer science, National Tsing Hua University

algorithm cont1
Algorithm (cont’)
  • The branch changing rule

Type I

Type II

Type III

SSLAB, Deportment of computer science, National Tsing Hua University

algorithm cont2
Algorithm (cont’)
  • The evaluation of three-sequence alignment
  • If SP(S’’) > SP(T’) then keep S’’
  • IF SP(T’) > SP(S’’) then keep T’

A

B

C

A

B

C

S’ = Align(B, C)

S’’ = Align(A, S’)

T’ = Align(A, B, C)

SSLAB, Deportment of computer science, National Tsing Hua University

algorithm cont3
Algorithm (cont’)
  • Modification of sequence weights
    • The calculation of sequence weight is same as ClustalW.

D

D

B

A

C

A

C

Weight of Hba_Human

= 0.055 + 0.219/2 + 0.061/4 + 0.015/5 + 0.062/6 = 0.194

Length between node A and node C

= 0.219 + 0.061 = 0.280

Weight of Hba_Human

= 0.055 + 0.280/2 + 0.077/5

= 0.210

  • The strategy of Gap penalty
    • Introduce position-specific gap penalty into three-sequence alignment (modified from ClustalW).

SSLAB, Deportment of computer science, National Tsing Hua University

experiments
Experiments
  • System environment
    • Linux (AMD opteron 250 2.4G with 512MB of memory)
  • Data source
    • BAliBASE 2.0
      • Reference sets (1 – 5). [T-Coffee, Muscle, Probcons, aln3nn, and etc]
      • Reference sets (6 – 8) contain repeats, inversions and transmembrane helices, for which none of the tested algorithms is designed. [Muscle]

SSLAB, Deportment of computer science, National Tsing Hua University

experiments cont
Experiments (cont’)
  • Scoring functions
    • Sum-of-pair (SP)
    • Total Column Score (TC)
  • Proportion probability (%)
    • No. of best alignment of the method/No. of total test sets
  • Comparing algorithms
    • CrossWAfast, CrossWAfull, ClustalW 1.83, T-Coffee 5.05, Muscle 3.6.
    • CrossWAfast : only use the type I in the branch changing rule.
    • CrossWAfull : use all types in the branch changing rule.

SSLAB, Deportment of computer science, National Tsing Hua University

experiments cont1
Experiments (cont’)
  • The comparison of SP scores among different alignment methods

SSLAB, Deportment of computer science, National Tsing Hua University

experiment cont
Experiment (cont’)
  • The comparison of TC scores among different alignment methods

SSLAB, Deportment of computer science, National Tsing Hua University

experiments cont2
Experiments (cont’)
  • The SP scores for each method of variant average identities in Reference 1 data set

SSLAB, Deportment of computer science, National Tsing Hua University

experiments cont3
Experiments (cont’)
  • The TC scores for each method of variant average identities in Reference 1 data set

SSLAB, Deportment of computer science, National Tsing Hua University

experiments cont4
Experiments (cont’)
  • The performance of CrossWA with 20 sequences

SSLAB, Deportment of computer science, National Tsing Hua University

experiments cont5
Experiments (cont’)
  • The Performance of CrossWA with 40 sequences

SSLAB, Deportment of computer science, National Tsing Hua University

experiments cont6
Experiments (cont’)
  • Comparison of performance among different methods with 20 sequences

SSLAB, Deportment of computer science, National Tsing Hua University

experiments cont7
Experiments (cont’)
  • Comparison of performance among different methods with 40 sequences

SSLAB, Deportment of computer science, National Tsing Hua University

conclusions
Conclusions
  • Three-sequence alignment can obtain better resulting alignment than pairwise alignment, but not for all data sets.
  • Combining three-sequence alignment and pairwise alignment can keep better alignment at any alignment step in progressive MSA.
  • From the experimental results, CrossWA can be another useful tool to align multiple sequence.
  • CrossWA can be used to align DNA sequences.
  • For aligning Genome data, computing time is a problem. It can be solved by parallel programming. [CY Lin, ICPP 2007]

SSLAB, Deportment of computer science, National Tsing Hua University

web service
Web service

Http://140.114.91.10/Genome

SSLAB, Deportment of computer science, National Tsing Hua University

reference
Reference
  • Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48:443-453.27. [Needleman, J Mol Biol 1970]
  • Smith TF, Waterman MS : Identification of common molecular subsequences. J. Mol. Biol. 1981, 147:195-197. [Smith, J Mol Biol 1981]
  • Murata M, Richardson JS, Sussman JL: Simultaneous comparison of three protein sequences. Proc Natl Acad Sci U S A. 1985, 82:3073-3077. [Murata, PNAS 1985]
  • Gotoh O: Alignment of three biological sequences with an efficient traceback procedure, J Theor Biol 1986, 327-337. [Gotoh, J Theor Biol 1986]
  • Huang X: Alignment of three sequences in quadratic space. Applied Computing Review 1993, 1:7-11. [Huang, ACM 1993]
  • Makoto H, Maski H, Masato I, Tomoyuki T: MASCOT: multiple alignment system for protein sequences based on three-way dynamic programming, J Mol Biol 1993, 2:161-167. [Makoto, Bioinformatics 1993]

SSLAB, Deportment of computer science, National Tsing Hua University

reference cont
Reference (cont’)
  • CY Lin, CT Huang, YC Chung, Chuan YT: Parallel Three-sequence Alignment with Space-efficient,Proceedings of the 23th Workshop on Combinatorial Mathematics and Computation Theory, Chang-Hua, Taiwan, April 2006, 160-165. [CY Lin, CMCT 2006]
  • CY Lin, CT Huang, YC Chung, Chuan YT: Efficient Parallel Algorithm for Optimal Three-Sequences Alignment. International Conference on Parallel Processing 2007. [CY Lin, ICPP 2007]
  • Loytynoja A, Goldman N: An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A. 2005,102(30):10557-10562. [Loytynoja, PNAS 2005]
  • Matthias K, Peter FS: Progressive multiple sequence alignments from triplets. BMC Bioinformatics 2007. [matthias, BMC Bioinformatics July, 2007]
  • Thompson JD: Introducing variable gap penalties to sequence alignment in linear space. Bioinformatics 1995, 11:181-186. [Thompson, Bioinformatics 1995]

SSLAB, Deportment of computer science, National Tsing Hua University

slide30

Thank you for your attention

SSLAB, Deportment of computer science, National Tsing Hua University