1 / 8

# Heuristic PSA - PowerPoint PPT Presentation

Heuristic PSA. “Words” to describe dot-matrix analysis Approaches FASTA BLAST Searching databases for sequence similarities PSA Alternative strategies Iterative searching Reverse searching. “Words” for Dot-matrix analysis. Useful ideas from DM Alignment Diagonal represents local match

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Heuristic PSA' - jamese

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• “Words” to describe dot-matrix analysis

• Approaches

• FASTA

• BLAST

• Searching databases for sequence similarities

• PSA

• Alternative strategies

• Iterative searching

• Reverse searching

Lecture 7 CS566

• Useful ideas from DM Alignment

• Diagonal represents local match

• Broken diagonal = intervening mismatch

• Displaced diagonals = Matches with gaps

• Advantage of using word-based alignment

• Faster algorithm

• Word-list comparison faster than sequence comparison

• Hashes used for rapid comparison of words

• “Devil is in the details”

Lecture 7 CS566

• Motivation: Needed rapid PSA method to search databases for matches to query sequence (1:n comparisons)

• ktup (k-tuple or word) based alignment

• Create hash tables for sequences

• Find matching ktups (“hot-spots”/short diagonals) in pair of sequences

• ktup size = 2 for protein (6 for DNA)

Lecture 7 CS566

• Find 10 best “diagonal-runs”

• Group hot-spots by the (i-j) diagonal they lie in

• Main diagonal numbered 0;

• Positive diagonals lie above main diagonal, negative lie below

• Diagonal-run = set of consecutive (not necessarily contiguous) hot-spots, penalized by size of intervening mismatch

• Save top 10 diagonal runs

Lecture 7 CS566

• Find init1

• Init1 = best contiguous subsequence from top 10 diagonal runs, based on AAS (default BLOSUM50)

• Define local search space around init1

• Include (32 / ktup) +/- diagonals in search space

• For ktup = 2, 16 diagonals around init1

• Perform Smith-Waterman PSA in reduced space

• Report resulting alignment as opt

Lecture 7 CS566

BLAST (Basic local alignment search tool)

• Built upon ideas derived from FASTA, with incorporation of new elements

• For every word in query, generate set of words

• Use AAS for similarity score between query word and all possible words of same size

• Include all words exceeding cut-off in set

• Example: For word DED, and threshold 0, word set includes DED, DDD, EEE, EDE etc.

• For every query word, generate hot-spots based on set of similar words

• Then merge contiguous words along same diagonal (a la FASTA) to form High Scoring Pairs (HSPs)

Lecture 7 CS566

• Word matching exact in FASTA but inexact (AAS-based) in BLAST

• Larger word size in BLAST

• FASTA more sensitive (Why?) but slower (Why?)

• BLAST handles “low-complexity” inline

• Programs DUST and/or SEG used for filtering sequences

Lecture 7 CS566

• Mapping query to different alphabets

• Protein versus DNA,

• DNA versus protein (Multiple reading frames)

• PSI-BLAST: Position-specific iterative BLAST

• Use query to find hits

• Assemble hits into on-the-fly Position-specific-scoring matrix (PSSM)

• RPS-BLAST: Reverse position-specific BLAST

• Query is search space

• Database of PSSMs used to search for match

Lecture 7 CS566