1 / 11

Heuristic Alignment Algorithms

Heuristic Alignment Algorithms. Hongchao Li Jan. 27 2004. Introduction. Heuristic Alignment Algorithms 1. BLAST (Basic Local Alignment Search Tool) 2. FASTA Problem 1. Local Alignment: looking for the best alignment between subsequences of two sequences.

gin
Download Presentation

Heuristic Alignment Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heuristic Alignment Algorithms Hongchao Li Jan. 27 2004

  2. Introduction • Heuristic Alignment Algorithms 1. BLAST (Basic Local Alignment Search Tool) 2. FASTA • Problem 1. Local Alignment: looking for the best alignment between subsequences of two sequences. 2. Problem Model: finding high scoring local alignments between a query sequence and a target database. (see next page)

  3. Introduction The problem model of BLAST and FASTA • Heuristic Thought True match alignments are very likely to contain somewhere within them a short stretch of identities or very high scoring matches.

  4. BLAST • Basic Idea We look initially for such short stretches and use them as ‘seeds’,from which to extend out in search of a good longer alignment. By keeping the seed segments short, it is possible to pre-process the query sequence to make a table of all possible seeds with their corresponding start points.

  5. BLAST • Method 1. Make a list of all ‘neighborhood words’ of a fixed length w (by default 3 for protein sequences, and 11 for nucleic acids), that would match the query sequence somewhere with score higher than some threshold T. 2. Scan through the database 3. Whenever find a word in this set, start a ‘hit extension’ process to extend the possible match as an ungapped alignment in both directions, stopping at the maximum scoring extension .

  6. BLAST • Example • Query Sequence: AGTACT • Target Database: TACTGAACTTGC • w =4, identity score =5, mismatch score= -4, Threshold T=11 Solution: 1. Make a list of all neighborhood words

  7. BLAST 2. Scan through the database 3. Hit extension

  8. FASTA • Basic idea It uses a multi-step approach to finding local high scoring alignments, starting from exact short word matches, through maximal scoring ungapped extensions, to finally identify gapped alignments. • Method 1. Use a lookup table to locate all identically matching words of length ktup between the two sequences. Then look for diagonals with many mutually supporting word matches. This operations, for example, can be done by sorting the matches on the difference of indices (i - j).

  9. FASTA 2. Extend the exact word matches in the best diagonals to find maximal scoring ungapped regions (and in the process possibly joining together several seed matches). 3. Checks to see if any of these ungapped regions can be joined by a gapped region, allowing for gap costs. 4. The highest scoring candidate matches in a database search are realigned using the fully dynamic programming algorithm, but restricted to a sub-region of the dynamic programming matrix forming a band around the candidate heuristic match.

  10. FASTA • Example

  11. Thanks!

More Related