multiple alignment feng doolittle algorithm
Skip this Video
Download Presentation
Multiple alignment: Feng-Doolittle algorithm

Loading in 2 Seconds...

play fullscreen
1 / 21

Multiple alignment: Feng-Doolittle algorithm - PowerPoint PPT Presentation

  • Uploaded on

Multiple alignment: Feng-Doolittle algorithm. Why multiple alignments?. Alignment of more than two sequences Usually gives better information about conserved regions and function (more data) Better estimate of significance when using a sequence of unknown function

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Multiple alignment: Feng-Doolittle algorithm' - gema

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
why multiple alignments
Why multiple alignments?
  • Alignment of more than two sequences
  • Usually gives better information about conserved regions and function (more data)
  • Better estimate of significance when using a sequence of unknown function
  • Must use multiple alignments when establishing phylogenetic relationships
dynamic programming extended to many dimensions
Dynamic programming extended to many dimensions?
  • No – uses up too much computer time and space
      • E.g. 200 amino acids in a pairwise alignment – must evaluate 4 x 104 matrix elements
      • If 3 sequences, 8 x 106 matrix elements
      • If 6 sequences, 6.4 x 1013 matrix elements
Need to find more efficient method
  • Sacrifice certainty of optimum alignment for certainty of good alignment but faster
feng doolittle algorithm
Feng-doolittle algorithm
  • Does all pairwise alignments and scores them
  • Converts pairwise scores to “distances”
      • D = -logSeff = -log [(Sobs –Srand)/(Smax –Srand)]
      • Sobs = pairwise alignment score
      • Srand = exoected score for random alignment
      • Smax = average of self-alignments of the two sequences
As Smax approaches Srand (increasing evolutionary distance), Seff goes down; to make the distance measure positive, use the -log
Once the distances have been calculated, construct a guide tree (more in the phylogeny class) – tells what order to group the sequences
  • Sequences can be aligned with sequences or groups; groups can be aligned with groups
Sequence-sequence alignments: dynamic programming
  • Sequence-group alignments: all possible pairwise alignments between sequence and group are tried, highest scoring pair is how it gets aligned to group
  • Group-group alignments: all possible pairwise alignments of sequences between groups are tried; highest scoring pair is how groups get aligned






Alignment 2

Alignment 1

Alignment 3

Final alignment

Notice that this method does not guarantee the optimum alignment; just a good one.

Gaps are preserved from alignment to alignment: “once a gap, always a gap”

in class exercise
In-class exercise
  • Retrieve sequences from multalign.apr into BioScout
  • Run Gap in BioScout on all combinations of the sequences in multalign.apr; use a gap penalty of 6 and an extension penalty of 2
  • Record alignment scores of each pairwise comparison
  • Save pairwise alignments
in class exercise cont
In class exercise, cont
  • use raw alignment scores as distance measures; make a guide tree based on these scores
  • In Vector NTI, select all sequences in multalign.apr (in the sequence pane); choose Alignment from the toolbar at the top; choose Alignment Setup from the pulldown; choose multiple alignment; take the defaults, choose ok; choose Alignment again, this time choose Align Selected Sequences from the pulldown
in class exercise cont1
In class exercise, cont.
  • Note that ClustalW does some other things that the Pileup program discussed on the tape does not; we are going to ignore those things for the moment
  • Compare ClustalW’s guide tree (visible in the Phylogenetic Tree Pane – tab at bottom of window) with yours
in class exercise cont2
In class exercise, cont
  • Carefully examine ClustalW’s alignment; compare it to the individual pairwise alignments you saved. Are there differences?
Start refining alignment:
      • Use structural info if you have it
      • Find patterns if you don’t
      • Use amino acid structure handout from beginning of class for substitution decisions!
  • Most widely used multiple alignment method
  • Similar strategy to the Feng-Doolittle approach implemented as Pileup, but more complex and gives generally superior results
  • Ad hoc nature of the program can be mysterious
advantageous differences
Advantageous differences
  • Gap penalties vary locally:
      • By observed frequency (in database) after each residue
      • By simple structure prediction – lower gap penalties in probable loop regions
      • By proximity to existing gaps – higher gap penalties when within 8 residues of an existing gap
advantages cont
Advantages, cont.
  • Change in substitution matrix choice depending on distance computed for guide tree
      • Substitution matrix families
  • Profile construction (more later)
  • Weighting of sequences in profiles depending on evolutionary distance computed for guide tree
      • More similar sequences get less weight than less similar sequences
in class exercise ii
In class exercise II
  • Change a few parameters in the ClustalW program (gap, gap extension, substitution matrix, etc.) one at a time: this is done in Alignment Setup. After each run with a different change, save the alignment project with some descriptive name that you can remember (e.g., gap20 or blosum)
  • Compare alignment results with different parameters changed
  • MultAlin is also a heuristic algorithm that builds up a multiple alignment from a group of pairwise alignments
  • It differs from Pileup and Clustal in that the guide tree is recalculated based on the results of each alignment step
  • Because this leads to cycles of tree building and alignmnent, MultAlin can take a long time to run. It stops after the overall alignment score stops improving
scoring a multiple sequence alignment
Scoring a multiple sequence alignment
  • Assumptions:
      • Sequences (rows) independent
      • Positions (columns) independent
  • Neither assumption is true …
  • Score of a column is the (possibly weighted) sum of all the pairwise comparisons (I.e., substitution matrix values) within that column
  • Score of a multiple alignment is the sum of scores for all columns