1 / 8

Heuristic PSA

Heuristic PSA. “Words” to describe dot-matrix analysis Approaches FASTA BLAST Searching databases for sequence similarities PSA Alternative strategies Iterative searching Reverse searching. “Words” for Dot-matrix analysis. Useful ideas from DM Alignment Diagonal represents local match

Download Presentation

Heuristic PSA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heuristic PSA • “Words” to describe dot-matrix analysis • Approaches • FASTA • BLAST • Searching databases for sequence similarities • PSA • Alternative strategies • Iterative searching • Reverse searching Lecture 7 CS566

  2. “Words” for Dot-matrix analysis • Useful ideas from DM Alignment • Diagonal represents local match • Broken diagonal = intervening mismatch • Displaced diagonals = Matches with gaps • Advantage of using word-based alignment • Faster algorithm • Word-list comparison faster than sequence comparison • Hashes used for rapid comparison of words • “Devil is in the details” Lecture 7 CS566

  3. FASTA (Fast-All) • Motivation: Needed rapid PSA method to search databases for matches to query sequence (1:n comparisons) • ktup (k-tuple or word) based alignment • Create hash tables for sequences • Find matching ktups (“hot-spots”/short diagonals) in pair of sequences • ktup size = 2 for protein (6 for DNA) Lecture 7 CS566

  4. FASTA • Find 10 best “diagonal-runs” • Group hot-spots by the (i-j) diagonal they lie in • Main diagonal numbered 0; • Positive diagonals lie above main diagonal, negative lie below • Diagonal-run = set of consecutive (not necessarily contiguous) hot-spots, penalized by size of intervening mismatch • Save top 10 diagonal runs Lecture 7 CS566

  5. FASTA • Find init1 • Init1 = best contiguous subsequence from top 10 diagonal runs, based on AAS (default BLOSUM50) • Define local search space around init1 • Include (32 / ktup) +/- diagonals in search space • For ktup = 2, 16 diagonals around init1 • Perform Smith-Waterman PSA in reduced space • Report resulting alignment as opt Lecture 7 CS566

  6. BLAST (Basic local alignment search tool) • Built upon ideas derived from FASTA, with incorporation of new elements • For every word in query, generate set of words • Use AAS for similarity score between query word and all possible words of same size • Include all words exceeding cut-off in set • Example: For word DED, and threshold 0, word set includes DED, DDD, EEE, EDE etc. • For every query word, generate hot-spots based on set of similar words • Then merge contiguous words along same diagonal (a la FASTA) to form High Scoring Pairs (HSPs) Lecture 7 CS566

  7. FASTA versus BLAST • Word matching exact in FASTA but inexact (AAS-based) in BLAST • Larger word size in BLAST • FASTA more sensitive (Why?) but slower (Why?) • BLAST handles “low-complexity” inline • Programs DUST and/or SEG used for filtering sequences Lecture 7 CS566

  8. Variations on BLAST-based searching • Mapping query to different alphabets • Protein versus DNA, • DNA versus protein (Multiple reading frames) • PSI-BLAST: Position-specific iterative BLAST • Use query to find hits • Assemble hits into on-the-fly Position-specific-scoring matrix (PSSM) • RPS-BLAST: Reverse position-specific BLAST • Query is search space • Database of PSSMs used to search for match Lecture 7 CS566

More Related