multiple sequence alignment l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Multiple Sequence Alignment PowerPoint Presentation
Download Presentation
Multiple Sequence Alignment

Loading in 2 Seconds...

play fullscreen
1 / 19

Multiple Sequence Alignment - PowerPoint PPT Presentation


  • 187 Views
  • Uploaded on

Multiple Sequence Alignment. Dynamic Programming. Multiple Sequence Alignment. VTISCTGSSSNIGAG  NHVKWYQQLPG VTISCTGTSSNIGS  ITVNWYQQLPG LRLSCSSSGFIFSS  YAMYWVRQAPG LSLTCTVSGTSFDD  YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG  ATLVCLISDFYPGA  VTVAWKADS  ATLVCLISDFYPGA  VTVAWKADS 

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Multiple Sequence Alignment' - JasminFlorian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multiple sequence alignment

Multiple Sequence Alignment

Dynamic Programming

multiple sequence alignment2
Multiple Sequence Alignment

VTISCTGSSSNIGAGNHVKWYQQLPG

VTISCTGTSSNIGSITVNWYQQLPG

LRLSCSSSGFIFSSYAMYWVRQAPG

LSLTCTVSGTSFDDYYSTWVRQPPG

PEVTCVVVDVSHEDPQVKFNWYVDG

ATLVCLISDFYPGAVTVAWKADS

ATLVCLISDFYPGAVTVAWKADS

AALGCLVKDYFPEPVTVSWNSG-

VSLTCLVKGFYPSDIAVEWESNG-

  • Goal: Bring the greatest number of similar characters into the same column of the alignment
  • Similar to alignment of two sequences.
clustalw msa
CLUSTALW MSA

MSA of four oxidoreductase NAD binding domain protein sequences. Red: AVFPMILW. Blue: DE. Magenta: RHK. Green: STYHCNGQ. Grey: all others. Residue ranges are shown after sequence names.

Chenna et al. Nucleic Acids Research, 2003, Vol. 31, No. 13 3497-3500

multiple sequence alignment motivation
Multiple Sequence Alignment: Motivation
  • Correspondence. Find out which parts “do the same thing”
    • Similar genes are conserved across widely divergent species, often performing similar functions
  • Structure prediction
    • Use knowledge of structure of one or more members of a protein MSA to predict structure of other members
    • Structure is more conserved than sequence
  • Create “profiles” for protein families
    • Allow us to search for other members of the family
  • Genome assembly: Automated reconstruction of “contig” maps of genomic fragments such as ESTs
  • MSA is the starting point for phylogenetic analysis
multiple sequence alignment approaches
Multiple Sequence Alignment: Approaches
  • Optimal Global Alignments -Dynamic programming
    • Generalization of Needleman-Wunsch
    • Find alignment that maximizes a score function
    • Computationally expensive: Time grows as product of sequence lengths
  • Global Progressive Alignments - Match closely-related sequences first using a guide tree
  • Global Iterative Alignments - Multiple re-building attempts to find best alignment
  • Local alignments
    • Profiles, Blocks, Patterns
scoring a multiple alignment
Scoring a multiple alignment

A

A

A

A

C

A

C

A

C

A

C

C

A

C

A

Sum of pairs

Star

Tree

sum of pairs

A

AAA

AAA

AAA

AAC

ACC

A

C

A

A

A

A

A

A

A

C

10α

+ (6α - 4β)

+ (4α - 6β)

A

A

A

C

Sum of Pairs

= 20α - 10β

sum of pairs scoring function
Sum-of-Pairs Scoring Function

Score of multiple alignment

= ∑i <j score(Si,Sj)

where score(Si,Sj) = score of induced pairwise alignment

induced pairwise alignment
Induced Pairwise Alignment

S1 S - T I S C T G - S - N I

S2 L - T I – C N G S S - N I

S3 L R T I S C S G F S Q N I

Induced pairwise alignment of S1,S2:

S1 S T I S C T G - S N I

S2 L T I – C N G S S N I

msa dynamic programming
MSA: Dynamic Programming
  • The two-sequence alignment algorithm can be generalized to any number of sequences.
  • E.g., for three sequences X, Y, W defineC[i,j,k] = score of optimum alignment among X[1..i], Y[1..j], W[1..k]
  • As for two sequences, divide possible alignments into different classes, depending on how they end.
    • Use to devise recurrence relations for C[i,j,k]
    • C[i,j,k] is the maximum out of all possibilities
msa 7 ways alignment can end for 3 sequences
MSA: 7 ways alignment can end for 3 sequences

Xi

Yj

Wk

X1 . . . Xi-1 Xi

Y1 . . . Yj-1 Yj

W1 . . . Wk-1 Wk

-

Yj

Wk

Xi

-

Wk

Xi

-

-

Xi

Yj

-

-

Yj

-

-

-

Wk

dynamic programming for three sequences

V

S

N

S

S

N

A

A

S

Dynamic programming for three sequences

Each alignment is a path through the dynamic programming matrix

S

A

A

N

S

V

S

N

S

Start

dynamic programming for three sequences13

For 3 seqs. of length n, time is proportional to n3

Dynamic Programming for Three Sequences

There are 7 ways to get to C[i,j,k]

C[i,j,k]

C[i-1,j,k-1]

C[i-1,j-1,k-1]

C[i-1,j,k-1]

Enumerate all possibilities and choose the best one

dynamic programming msa general case
Dynamic Programming MSA: General Case
  • For k sequences of length n, dynamic programming algorithm does (2k-1)nkoperations
    • Example: 6 sequences of length 100 require6.4X1013 calculations
  • Space for table is nk
  • Implementations (e.g., WashU MSA 2.1) use tricks and only search subset of dynamic programming table
    • Even this is expensive. E.g., Baylor CM Search launcher limits MSA to 8 sequences of 800 characters and 10 minutes processing time
problems with sp scoring
Problems with SP scoring
  • Pair-wise comparisons can over-score evolutionarily distant pairs.
  • Reason: For 3 or more sequences, SP scoring does not correspond to any evolutionary tree

But not:

overcoming problems with sp scoring
Overcoming problems with SP scoring
  • Use weights to incorporate evolution in sum of pairs scoring:
    • Some pair-wise alignments are more important than others
      • E.g., more important to have a good alignment between mouse and human sequences than mouse and bird
    • Assign different weights to different pair-wise alignments.
      • Weight decreases with evolutionary distance.
  • Use star tree approach
    • one sequence is assigned as the ancestor and all others are contrasted it.
star alignments
Star Alignments
  • Construct multiple alignments using pair-wise alignment relative to a fixed sequence
  • Out of a set S = {S1, S2, . . . , Sr} of sequences, pick sequence Sc that maximizesstar_score(c) = ∑ {sim(Sc, Si) : 1 ≤ i ≤ r, i ≠ c}where sim(Si, Sj) is the optimal score of a pair-wise alignment between Si and Sj
algorithm
Algorithm
  • Compute sim(Si, Sj) for every pair (i,j)
  • Compute star_score(i) for every i
  • Choose the index c that minimizes star_score(c) and make it the center of the star
  • Produce a multiple alignment M such that, for every i, the induced pairwise alignment of Sc and Si is the same as the optimum alignment of Sc and Si.
step 4 detail
Step 4: Detail

ScA-ACC-TT

S2AGACCGT-

ScAA--CCTT

S1AATGCC--

ScA-A--CC-TT

S1A-ATGCC---

S2AGA--CCGT-