Multiple alignment feng doolittle algorithm
Sponsored Links
This presentation is the property of its rightful owner.
1 / 21

Multiple alignment: Feng-Doolittle algorithm PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Multiple alignment: Feng-Doolittle algorithm. Why multiple alignments?. Alignment of more than two sequences Usually gives better information about conserved regions and function (more data) Better estimate of significance when using a sequence of unknown function

Download Presentation

Multiple alignment: Feng-Doolittle algorithm

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Multiple alignment: Feng-Doolittle algorithm

Why multiple alignments?

  • Alignment of more than two sequences

  • Usually gives better information about conserved regions and function (more data)

  • Better estimate of significance when using a sequence of unknown function

  • Must use multiple alignments when establishing phylogenetic relationships

Dynamic programming extended to many dimensions?

  • No – uses up too much computer time and space

    • E.g. 200 amino acids in a pairwise alignment – must evaluate 4 x 104 matrix elements

    • If 3 sequences, 8 x 106 matrix elements

    • If 6 sequences, 6.4 x 1013 matrix elements

  • Need to find more efficient method

  • Sacrifice certainty of optimum alignment for certainty of good alignment but faster

Feng-doolittle algorithm

  • Does all pairwise alignments and scores them

  • Converts pairwise scores to “distances”

    • D = -logSeff = -log [(Sobs –Srand)/(Smax –Srand)]

    • Sobs = pairwise alignment score

    • Srand = exoected score for random alignment

    • Smax = average of self-alignments of the two sequences

  • As Smax approaches Srand (increasing evolutionary distance), Seff goes down; to make the distance measure positive, use the -log

  • Once the distances have been calculated, construct a guide tree (more in the phylogeny class) – tells what order to group the sequences

  • Sequences can be aligned with sequences or groups; groups can be aligned with groups

  • Sequence-sequence alignments: dynamic programming

  • Sequence-group alignments: all possible pairwise alignments between sequence and group are tried, highest scoring pair is how it gets aligned to group

  • Group-group alignments: all possible pairwise alignments of sequences between groups are tried; highest scoring pair is how groups get aligned







Alignment 2

Alignment 1

Alignment 3

Final alignment

Notice that this method does not guarantee the optimum alignment; just a good one.

Gaps are preserved from alignment to alignment: “once a gap, always a gap”

In-class exercise

  • Retrieve sequences from multalign.apr into BioScout

  • Run Gap in BioScout on all combinations of the sequences in multalign.apr; use a gap penalty of 6 and an extension penalty of 2

  • Record alignment scores of each pairwise comparison

  • Save pairwise alignments

In class exercise, cont

  • use raw alignment scores as distance measures; make a guide tree based on these scores

  • In Vector NTI, select all sequences in multalign.apr (in the sequence pane); choose Alignment from the toolbar at the top; choose Alignment Setup from the pulldown; choose multiple alignment; take the defaults, choose ok; choose Alignment again, this time choose Align Selected Sequences from the pulldown

In class exercise, cont.

  • Note that ClustalW does some other things that the Pileup program discussed on the tape does not; we are going to ignore those things for the moment

  • Compare ClustalW’s guide tree (visible in the Phylogenetic Tree Pane – tab at bottom of window) with yours

In class exercise, cont

  • Carefully examine ClustalW’s alignment; compare it to the individual pairwise alignments you saved. Are there differences?

  • Start refining alignment:

    • Use structural info if you have it

    • Find patterns if you don’t

    • Use amino acid structure handout from beginning of class for substitution decisions!


  • Most widely used multiple alignment method

  • Similar strategy to the Feng-Doolittle approach implemented as Pileup, but more complex and gives generally superior results

  • Ad hoc nature of the program can be mysterious

Advantageous differences

  • Gap penalties vary locally:

    • By observed frequency (in database) after each residue

    • By simple structure prediction – lower gap penalties in probable loop regions

    • By proximity to existing gaps – higher gap penalties when within 8 residues of an existing gap

Advantages, cont.

  • Change in substitution matrix choice depending on distance computed for guide tree

    • Substitution matrix families

  • Profile construction (more later)

  • Weighting of sequences in profiles depending on evolutionary distance computed for guide tree

    • More similar sequences get less weight than less similar sequences

  • In class exercise II

    • Change a few parameters in the ClustalW program (gap, gap extension, substitution matrix, etc.) one at a time: this is done in Alignment Setup. After each run with a different change, save the alignment project with some descriptive name that you can remember (e.g., gap20 or blosum)

    • Compare alignment results with different parameters changed


    • MultAlin is also a heuristic algorithm that builds up a multiple alignment from a group of pairwise alignments

    • It differs from Pileup and Clustal in that the guide tree is recalculated based on the results of each alignment step

    • Because this leads to cycles of tree building and alignmnent, MultAlin can take a long time to run. It stops after the overall alignment score stops improving

    Scoring a multiple sequence alignment

    • Assumptions:

      • Sequences (rows) independent

      • Positions (columns) independent

  • Neither assumption is true …

  • Score of a column is the (possibly weighted) sum of all the pairwise comparisons (I.e., substitution matrix values) within that column

  • Score of a multiple alignment is the sum of scores for all columns

  • Login