Heuristic PSA

1 / 8

# Heuristic PSA - PowerPoint PPT Presentation

Heuristic PSA. “Words” to describe dot-matrix analysis Approaches FASTA BLAST Searching databases for sequence similarities PSA Alternative strategies Iterative searching Reverse searching. “Words” for Dot-matrix analysis. Useful ideas from DM Alignment Diagonal represents local match

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Heuristic PSA' - jamese

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Heuristic PSA
• “Words” to describe dot-matrix analysis
• Approaches
• FASTA
• BLAST
• Searching databases for sequence similarities
• PSA
• Alternative strategies
• Iterative searching
• Reverse searching

Lecture 7 CS566

“Words” for Dot-matrix analysis
• Useful ideas from DM Alignment
• Diagonal represents local match
• Broken diagonal = intervening mismatch
• Displaced diagonals = Matches with gaps
• Advantage of using word-based alignment
• Faster algorithm
• Word-list comparison faster than sequence comparison
• Hashes used for rapid comparison of words
• “Devil is in the details”

Lecture 7 CS566

FASTA (Fast-All)
• Motivation: Needed rapid PSA method to search databases for matches to query sequence (1:n comparisons)
• ktup (k-tuple or word) based alignment
• Create hash tables for sequences
• Find matching ktups (“hot-spots”/short diagonals) in pair of sequences
• ktup size = 2 for protein (6 for DNA)

Lecture 7 CS566

FASTA
• Find 10 best “diagonal-runs”
• Group hot-spots by the (i-j) diagonal they lie in
• Main diagonal numbered 0;
• Positive diagonals lie above main diagonal, negative lie below
• Diagonal-run = set of consecutive (not necessarily contiguous) hot-spots, penalized by size of intervening mismatch
• Save top 10 diagonal runs

Lecture 7 CS566

FASTA
• Find init1
• Init1 = best contiguous subsequence from top 10 diagonal runs, based on AAS (default BLOSUM50)
• Define local search space around init1
• Include (32 / ktup) +/- diagonals in search space
• For ktup = 2, 16 diagonals around init1
• Perform Smith-Waterman PSA in reduced space
• Report resulting alignment as opt

Lecture 7 CS566

BLAST (Basic local alignment search tool)
• Built upon ideas derived from FASTA, with incorporation of new elements
• For every word in query, generate set of words
• Use AAS for similarity score between query word and all possible words of same size
• Include all words exceeding cut-off in set
• Example: For word DED, and threshold 0, word set includes DED, DDD, EEE, EDE etc.
• For every query word, generate hot-spots based on set of similar words
• Then merge contiguous words along same diagonal (a la FASTA) to form High Scoring Pairs (HSPs)

Lecture 7 CS566

FASTA versus BLAST
• Word matching exact in FASTA but inexact (AAS-based) in BLAST
• Larger word size in BLAST
• FASTA more sensitive (Why?) but slower (Why?)
• BLAST handles “low-complexity” inline
• Programs DUST and/or SEG used for filtering sequences

Lecture 7 CS566

Variations on BLAST-based searching
• Mapping query to different alphabets
• Protein versus DNA,
• DNA versus protein (Multiple reading frames)
• PSI-BLAST: Position-specific iterative BLAST
• Use query to find hits
• Assemble hits into on-the-fly Position-specific-scoring matrix (PSSM)
• RPS-BLAST: Reverse position-specific BLAST
• Query is search space
• Database of PSSMs used to search for match

Lecture 7 CS566