Heuristic Alignment Algorithms

Heuristic Alignment Algorithms Hongchao Li Jan. 27 2004

Introduction • Heuristic Alignment Algorithms 1. BLAST (Basic Local Alignment Search Tool) 2. FASTA • Problem 1. Local Alignment: looking for the best alignment between subsequences of two sequences. 2. Problem Model: finding high scoring local alignments between a query sequence and a target database. (see next page)

Introduction The problem model of BLAST and FASTA • Heuristic Thought True match alignments are very likely to contain somewhere within them a short stretch of identities or very high scoring matches.

BLAST • Basic Idea We look initially for such short stretches and use them as ‘seeds’,from which to extend out in search of a good longer alignment. By keeping the seed segments short, it is possible to pre-process the query sequence to make a table of all possible seeds with their corresponding start points.

BLAST • Method 1. Make a list of all ‘neighborhood words’ of a fixed length w (by default 3 for protein sequences, and 11 for nucleic acids), that would match the query sequence somewhere with score higher than some threshold T. 2. Scan through the database 3. Whenever find a word in this set, start a ‘hit extension’ process to extend the possible match as an ungapped alignment in both directions, stopping at the maximum scoring extension .

BLAST • Example • Query Sequence: AGTACT • Target Database: TACTGAACTTGC • w =4, identity score =5, mismatch score= -4, Threshold T=11 Solution: 1. Make a list of all neighborhood words

BLAST 2. Scan through the database 3. Hit extension

FASTA • Basic idea It uses a multi-step approach to finding local high scoring alignments, starting from exact short word matches, through maximal scoring ungapped extensions, to finally identify gapped alignments. • Method 1. Use a lookup table to locate all identically matching words of length ktup between the two sequences. Then look for diagonals with many mutually supporting word matches. This operations, for example, can be done by sorting the matches on the difference of indices (i - j).

FASTA 2. Extend the exact word matches in the best diagonals to find maximal scoring ungapped regions (and in the process possibly joining together several seed matches). 3. Checks to see if any of these ungapped regions can be joined by a gapped region, allowing for gap costs. 4. The highest scoring candidate matches in a database search are realigned using the fully dynamic programming algorithm, but restricted to a sub-region of the dynamic programming matrix forming a band around the candidate heuristic match.

FASTA • Example

Thanks!

Heuristic Alignment Algorithms

Heuristic Alignment Algorithms

Presentation Transcript

Sequence alignment algorithms

Developing Pairwise Sequence Alignment Algorithms

Alignment and Algorithms

Space Efficient Alignment Algorithms

New Algorithms for SIMD Alignment

Algorithms for Alignment of Genomic Sequences

Time Hierarchies for Heuristic Algorithms

Practical algorithms in Sequence Alignment

Mesh Alignment Algorithms

Informed (Heuristic) Search Algorithms

Generalized Tree Alignment: The Deferred Path Heuristic

Developing Sequence Alignment Algorithms in C++

Heuristic alignment algorithms and cost matrices

1. BLAST (Basic Local Alignment Search Tool) Heuristic

Heuristic alignment algorithms; Cost matrices

Algorithms for Pairwise Sequence Alignment

Practical algorithms in Sequence Alignment

Optimizing a Chess Heuristic Using Evolutionary Algorithms

Developing Pairwise Sequence Alignment Algorithms

Developing Pairwise Sequence Alignment Algorithms

Developing Pairwise Sequence Alignment Algorithms

Class 4: Sequence Alignment II Gaps, Heuristic Search