1 / 16

CS 6990 Bioinformatics BLAST

CS 6990 Bioinformatics BLAST. Fall 2004 Dr. Susan Bridges. Overview. B asic L ocal A lignment T ool BLAST is a collection of programs Developed by Altschul, et al. Simplification of the Smith Waterman Dynamic Programming algorithm

nedra
Download Presentation

CS 6990 Bioinformatics BLAST

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 6990BioinformaticsBLAST Fall 2004 Dr. Susan Bridges Department of Computer Science and Engineering Bioinformatics

  2. Overview • Basic Local Alignment Tool • BLAST is a collection of programs • Developed by Altschul, et al. • Simplification of the Smith Waterman Dynamic Programming algorithm • It looks for matches of short words (but not necessarily exact) Department of Computer Science and Engineering Bioinformatics

  3. BLAST Terminology • Segment—a substring of a sequence • Segment pair of two sequences—pair of segments of the same length (no gaps), one from each sequence • w-mer—a substring (or word) of w characters Department of Computer Science and Engineering Bioinformatics

  4. Goal • Form a gapless alignment between pairs and score the alignment using an amino acid substitution matrix. • Example (using PAM 120) K A L M R V A K N S -4 3 -4 -3 -1 Total score of alignment = -9 Department of Computer Science and Engineering Bioinformatics

  5. Steps in the Algorithm • Compile a list of high-scoring words in the query sequence • Find matches in the db for each high-scoring word • For each match in the db, extend the alignment in both directions Department of Computer Science and Engineering Bioinformatics

  6. Step 1 • Compile a list of high-scoring words in the query sequence • Defaults of w=3 for proteins, and w=11 for nucleic acid sequences • The total number of words will be n-w+1 • Each word has a score t toward the query sequence computed using scoring matrix • Threshold T: t-scores above T for any word pair indicates synonyms (T is called the neighborhood word score threshold) Department of Computer Science and Engineering Bioinformatics

  7. Step 1 Example (w=2) Adipokinetic hormone II of migratory locust q l n f s a g w q l l n n f f s s a a g g w Department of Computer Science and Engineering Bioinformatics

  8. Step 1 continued • Find all words in the db that are synonyms of the high scoring query words Department of Computer Science and Engineering Bioinformatics

  9. Example continued (T=8, PAM120 Scoring Matrix) Department of Computer Science and Engineering Bioinformatics

  10. Step 2 • For each word or synonym from the query, search for a hit in all db sequences • Each hit is considered a seed alignment and is extended in both directions as long as the score of the alignment is increased. (newer versions allows short gaps) q l n f s a g w w i d f a a c p • If the score for the segment pair is higher than a threshold S, the score and the endpoints are stored. • High scoring segment pairs are called HSPs • The highest scoring segment pair for the whole pairwise comparison is referred to as the maximal-scoring segment pair (MSP) Department of Computer Science and Engineering Bioinformatics

  11. Step 3 The HSP’s of the entire database are compared to a cutoff score S, and those greater than S, are returned. Query: q l n f s a g w Return all matched sequences with scores greater than 8 Department of Computer Science and Engineering Bioinformatics

  12. Step 4 • Compute the statistical significance of each HSP score. Department of Computer Science and Engineering Bioinformatics

  13. Step 5 • Alignment of the segments are done • The alignment score is obtained • The E() value for this score is calculated. • If the calculated E() for the database sequence meets the user given E() for the program, this score is reported. Department of Computer Science and Engineering Bioinformatics

  14. BLAST output • The list of hits • Database accession codes, name, description, general information about the hit. • Score in bits, the alignment score expressed in units of information. • Expectation value E() Department of Computer Science and Engineering Bioinformatics

  15. BLAST programs Department of Computer Science and Engineering Bioinformatics

  16. References • Setubal and Meidanis, Introduction to Computational Molecular Biology • NCBI Education Pages, http://www.ncbi.nih.gov/Education/BLASTinfo/BLAST_algorithm.html • Weizmann Institute of Science, http://bioportal.weizmann.ac.il/course/introbioinfo/ • Computers and the Human Genome Project, http://www-cse.stanford.edu/classes/sophomore-college/projects-00/computers-and-the-hgp/BLAST.html • The BLAST Help Manual, http://www.ncbi.nlm.nih.gov/BLAST/blast_help.shtml Department of Computer Science and Engineering Bioinformatics

More Related