1 / 8

Lecture #7: FASTA & LFASTA

Lecture #7: FASTA & LFASTA. BIOINF 2051 Fall 2002. Dot Plot. Alpha chain vs. Beta chain of Human Hemoglobin. FASTA and LFASTA. Pearson and Lipman (1988) FASTA – program that calculates the initial and optimal similarity scores between two sequences

gchery
Download Presentation

Lecture #7: FASTA & LFASTA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture #7: FASTA & LFASTA BIOINF 2051 Fall 2002

  2. Dot Plot Alpha chain vs. Beta chain of Human Hemoglobin

  3. FASTA and LFASTA • Pearson and Lipman (1988) • FASTA – program that calculates the initial and optimal similarity scores between two sequences • LFASTA – program for detecting local similarities – finds multiple alignments between smaller portions of two sequences

  4. The FASTA algorithm • Four steps: • Identify regions of similarity: • Using the ktup parameter which specifies # consecutive identities required in a match • 10 best diagonal regions found based on #matches and distance between matches • Rescore regions and identify best initial regions • PAM250 or other scoring matrix used for rescoring the 10 diagonal regions identified in step 1 to allow for conservative replacements and runs of identities shorter than ktup • For each the best diagonal regions, identify “initial region” that is best scoring subregion

  5. The FASTA algorithm • Optimally join initial regions with scores > T • Given: location of initial regions, scores, gap penalty • Calculate an optimal alignment of initial regions as a combination of compatible regions with maximal score • Use resulting score to rank the library sequences • Selectivity degradation limited by using initial regions that score greater than some threshold T • Align the highest scoring library sequences using modification of global and local alignment algorithms • Considers all possible alignments of the query and library sequence that falls within a band centered around the highest scoring initial region

  6. LFASTA • FASTA – reports only one highest scoring alignment between two sequences • LFASTA – local sequence comparison tool that can identify multiple local alignments between 2 sequences • Optimal algorithms for sensitive local sequence comparison are computationally intensive in terms of time and memory

  7. LFASTA vs. FASTA • LFASTA uses same first 2 steps for finding initial regions as FASTA, except: • Instead of saving 10 initial regions, LFASTA saves all diagonal regions with similarity scores > some threshold • Construction of optimized alignments • Instead of focusing on a single region, LFASTA computes a local alignment for each initial region • Also, apart from band around initial region, LFASTA considers potential sequence alignments for some distance before and after the initial region.

  8. Self-comparison of myosin heavy chain from C. elegans • See plot from a local similarity self-comparison of the myosin heavy chain (NBRF code MWKW) using the PAM 250 matrix • The amino-terminal half of the molecule forms a large globular head without any periodic structure • The symmetrical parallel lines along the C-terminal half correspond to the 28-residue repeat responsible for the a-helical coiled-coil structure of the rod segment

More Related