Multiple alignment: Feng-Doolittle algorithm

/ 21 []
Download Presentation
(1156) |   (0) |   (0)
Views: 42 | Added:
Rate Presentation: 0 0
Multiple alignment: Feng-Doolittle algorithm. Why multiple alignments?. Alignment of more than two sequences Usually gives better information about conserved regions and function (more data) Better estimate of significance when using a sequence of unknown function
Multiple alignment: Feng-Doolittle algorithm

An Image/Link below is provided (as is) to

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime. While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Slide 1

Multiple alignment: Feng-Doolittle algorithm

Slide 2

Why multiple alignments?

  • Alignment of more than two sequences

  • Usually gives better information about conserved regions and function (more data)

  • Better estimate of significance when using a sequence of unknown function

  • Must use multiple alignments when establishing phylogenetic relationships

Slide 3

Dynamic programming extended to many dimensions?

  • No – uses up too much computer time and space

    • E.g. 200 amino acids in a pairwise alignment – must evaluate 4 x 104 matrix elements

    • If 3 sequences, 8 x 106 matrix elements

    • If 6 sequences, 6.4 x 1013 matrix elements

Slide 4

  • Need to find more efficient method

  • Sacrifice certainty of optimum alignment for certainty of good alignment but faster

Slide 5

Feng-doolittle algorithm

  • Does all pairwise alignments and scores them

  • Converts pairwise scores to “distances”

    • D = -logSeff = -log [(Sobs –Srand)/(Smax –Srand)]

    • Sobs = pairwise alignment score

    • Srand = exoected score for random alignment

    • Smax = average of self-alignments of the two sequences

Slide 6

  • As Smax approaches Srand (increasing evolutionary distance), Seff goes down; to make the distance measure positive, use the -log

Slide 7

  • Once the distances have been calculated, construct a guide tree (more in the phylogeny class) – tells what order to group the sequences

  • Sequences can be aligned with sequences or groups; groups can be aligned with groups

Slide 8

  • Sequence-sequence alignments: dynamic programming

  • Sequence-group alignments: all possible pairwise alignments between sequence and group are tried, highest scoring pair is how it gets aligned to group

  • Group-group alignments: all possible pairwise alignments of sequences between groups are tried; highest scoring pair is how groups get aligned

Slide 9







Alignment 2

Alignment 1

Alignment 3

Final alignment

Slide 10

Notice that this method does not guarantee the optimum alignment; just a good one.

Gaps are preserved from alignment to alignment: “once a gap, always a gap”

Slide 11

In-class exercise

  • Retrieve sequences from multalign.apr into BioScout

  • Run Gap in BioScout on all combinations of the sequences in multalign.apr; use a gap penalty of 6 and an extension penalty of 2

  • Record alignment scores of each pairwise comparison

  • Save pairwise alignments

Slide 12

In class exercise, cont

  • use raw alignment scores as distance measures; make a guide tree based on these scores

  • In Vector NTI, select all sequences in multalign.apr (in the sequence pane); choose Alignment from the toolbar at the top; choose Alignment Setup from the pulldown; choose multiple alignment; take the defaults, choose ok; choose Alignment again, this time choose Align Selected Sequences from the pulldown

Slide 13

In class exercise, cont.

  • Note that ClustalW does some other things that the Pileup program discussed on the tape does not; we are going to ignore those things for the moment

  • Compare ClustalW’s guide tree (visible in the Phylogenetic Tree Pane – tab at bottom of window) with yours

Slide 14

In class exercise, cont

  • Carefully examine ClustalW’s alignment; compare it to the individual pairwise alignments you saved. Are there differences?

Slide 15

  • Start refining alignment:

    • Use structural info if you have it

    • Find patterns if you don’t

    • Use amino acid structure handout from beginning of class for substitution decisions!

Slide 16


  • Most widely used multiple alignment method

  • Similar strategy to the Feng-Doolittle approach implemented as Pileup, but more complex and gives generally superior results

  • Ad hoc nature of the program can be mysterious

Slide 17

Advantageous differences

  • Gap penalties vary locally:

    • By observed frequency (in database) after each residue

    • By simple structure prediction – lower gap penalties in probable loop regions

    • By proximity to existing gaps – higher gap penalties when within 8 residues of an existing gap

Slide 18

Advantages, cont.

  • Change in substitution matrix choice depending on distance computed for guide tree

    • Substitution matrix families

  • Profile construction (more later)

  • Weighting of sequences in profiles depending on evolutionary distance computed for guide tree

    • More similar sequences get less weight than less similar sequences

  • Slide 19

    In class exercise II

    • Change a few parameters in the ClustalW program (gap, gap extension, substitution matrix, etc.) one at a time: this is done in Alignment Setup. After each run with a different change, save the alignment project with some descriptive name that you can remember (e.g., gap20 or blosum)

    • Compare alignment results with different parameters changed

    Slide 20


    • MultAlin is also a heuristic algorithm that builds up a multiple alignment from a group of pairwise alignments

    • It differs from Pileup and Clustal in that the guide tree is recalculated based on the results of each alignment step

    • Because this leads to cycles of tree building and alignmnent, MultAlin can take a long time to run. It stops after the overall alignment score stops improving

    Slide 21

    Scoring a multiple sequence alignment

    • Assumptions:

      • Sequences (rows) independent

      • Positions (columns) independent

  • Neither assumption is true …

  • Score of a column is the (possibly weighted) sum of all the pairwise comparisons (I.e., substitution matrix values) within that column

  • Score of a multiple alignment is the sum of scores for all columns

  • Copyright © 2014 SlideServe. All rights reserved | Powered By DigitalOfficePro