1 of / 21

Multiple alignment: Feng-Doolittle algorithm

Why multiple alignments?. Alignment of more than two sequencesUsually gives better information about conserved regions and function (more data)Better estimate of significance when using a sequence of unknown functionMust use multiple alignments when establishing phylogenetic relationships. Dynami

Download Presentation

Multiple alignment: Feng-Doolittle algorithm

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime.While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Multiple alignment feng doolittle algorithm

Multiple alignment: Feng-Doolittle algorithm

Why multiple alignments

Why multiple alignments?

  • Alignment of more than two sequences

  • Usually gives better information about conserved regions and function (more data)

  • Better estimate of significance when using a sequence of unknown function

  • Must use multiple alignments when establishing phylogenetic relationships

Dynamic programming extended to many dimensions

Dynamic programming extended to many dimensions?

  • No – uses up too much computer time and space

    • E.g. 200 amino acids in a pairwise alignment – must evaluate 4 x 104 matrix elements

    • If 3 sequences, 8 x 106 matrix elements

    • If 6 sequences, 6.4 x 1013 matrix elements

  • Need to find more efficient method

  • Sacrifice certainty of optimum alignment for certainty of good alignment but faster

Feng doolittle algorithm

Feng-doolittle algorithm

  • Does all pairwise alignments and scores them

  • Converts pairwise scores to “distances”

    • D = -logSeff = -log [(Sobs –Srand)/(Smax –Srand)]

    • Sobs = pairwise alignment score

    • Srand = exoected score for random alignment

    • Smax = average of self-alignments of the two sequences

  • As Smax approaches Srand (increasing evolutionary distance), Seff goes down; to make the distance measure positive, use the -log

  • Once the distances have been calculated, construct a guide tree (more in the phylogeny class) – tells what order to group the sequences

  • Sequences can be aligned with sequences or groups; groups can be aligned with groups

  • Sequence-sequence alignments: dynamic programming

  • Sequence-group alignments: all possible pairwise alignments between sequence and group are tried, highest scoring pair is how it gets aligned to group

  • Group-group alignments: all possible pairwise alignments of sequences between groups are tried; highest scoring pair is how groups get aligned








Alignment 2

Alignment 1

Alignment 3

Final alignment

Notice that this method does not guarantee the optimum alignment; just a good one.

Gaps are preserved from alignment to alignment: “once a gap, always a gap”

In class exercise

In-class exercise

  • Retrieve sequences from multalign.apr into BioScout

  • Run Gap in BioScout on all combinations of the sequences in multalign.apr; use a gap penalty of 6 and an extension penalty of 2

  • Record alignment scores of each pairwise comparison

  • Save pairwise alignments

In class exercise cont

In class exercise, cont

  • use raw alignment scores as distance measures; make a guide tree based on these scores

  • In Vector NTI, select all sequences in multalign.apr (in the sequence pane); choose Alignment from the toolbar at the top; choose Alignment Setup from the pulldown; choose multiple alignment; take the defaults, choose ok; choose Alignment again, this time choose Align Selected Sequences from the pulldown

In class exercise cont1

In class exercise, cont.

  • Note that ClustalW does some other things that the Pileup program discussed on the tape does not; we are going to ignore those things for the moment

  • Compare ClustalW’s guide tree (visible in the Phylogenetic Tree Pane – tab at bottom of window) with yours

In class exercise cont2

In class exercise, cont

  • Carefully examine ClustalW’s alignment; compare it to the individual pairwise alignments you saved. Are there differences?

  • Start refining alignment:

    • Use structural info if you have it

    • Find patterns if you don’t

    • Use amino acid structure handout from beginning of class for substitution decisions!



  • Most widely used multiple alignment method

  • Similar strategy to the Feng-Doolittle approach implemented as Pileup, but more complex and gives generally superior results

  • Ad hoc nature of the program can be mysterious

Advantageous differences

Advantageous differences

  • Gap penalties vary locally:

    • By observed frequency (in database) after each residue

    • By simple structure prediction – lower gap penalties in probable loop regions

    • By proximity to existing gaps – higher gap penalties when within 8 residues of an existing gap

Advantages cont

Advantages, cont.

  • Change in substitution matrix choice depending on distance computed for guide tree

    • Substitution matrix families

  • Profile construction (more later)

  • Weighting of sequences in profiles depending on evolutionary distance computed for guide tree

    • More similar sequences get less weight than less similar sequences

  • In class exercise ii

    In class exercise II

    • Change a few parameters in the ClustalW program (gap, gap extension, substitution matrix, etc.) one at a time: this is done in Alignment Setup. After each run with a different change, save the alignment project with some descriptive name that you can remember (e.g., gap20 or blosum)

    • Compare alignment results with different parameters changed



    • MultAlin is also a heuristic algorithm that builds up a multiple alignment from a group of pairwise alignments

    • It differs from Pileup and Clustal in that the guide tree is recalculated based on the results of each alignment step

    • Because this leads to cycles of tree building and alignmnent, MultAlin can take a long time to run. It stops after the overall alignment score stops improving

    Scoring a multiple sequence alignment

    Scoring a multiple sequence alignment

    • Assumptions:

      • Sequences (rows) independent

      • Positions (columns) independent

  • Neither assumption is true …

  • Score of a column is the (possibly weighted) sum of all the pairwise comparisons (I.e., substitution matrix values) within that column

  • Score of a multiple alignment is the sum of scores for all columns