Understanding Pairwise Local Alignment and Database Search in Bioinformatics
120 likes | 232 Views
Explore the concepts of local alignment and homology search in bioinformatics. Learn about dynamic programming, dot plots, remote homology, and statistical probability calculations in sequence analysis.
Understanding Pairwise Local Alignment and Database Search in Bioinformatics
E N D
Presentation Transcript
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics
Homology Search • Given sequence q does there exist a sequence d in a database D such that q and d are homolgous? • Could perform global pairwise alignment between q and each sequence in D, but • Maybe only a segment of q is highly (beyond random) similar to a segment of a database sequence • Remote homology – only motif conserved • Sequence/domain rearrangements – sequences not globally homologous, but share domain • Local alignment (alignment of segment of q with segment of d) desirable
Homology Search – Task • Present all sequences in D that have segments homologous to segments in q • Avoid presenting sequences in D that are not homologous • For each local alignment – calculate statistical probability that alignment is ”random” (not caused by evolutionary relation)
Definitions • Segment – contiguous subsequence (substring) of q or d • Segment pair – pair of segments, one from q and one from d (need not be of the same length) • Local alignment – alignment of a segment pair
Dot Plot – Visualising Similarity • For sequence q (length m), d (length n), construct m times n matrix • Make a dot in cell (i,j) if qi=dj. • Possible to filter matrix • E.g., use window of length K – make dot in (i,j) only if at least C% of characters are similar between K-windows around (i,j)
Dot Plots are Easy to Interpret • Can identify for instance repeats • Example: • Human HPRT gene (genomic sequence) • Dot if 8 identical bases • http://www.ansorge-group.embl.de/ geneskipper/dotplot.htm
Dynamic Programming for Local Alignment (Smith & Waterman 1981) • Assumptions • scoring matrix has ”negative expectation” • gaps should decrease alignment score (as before) • Consequence: • Subalignment with negative score coming first (prefix) or last (suffix) can be removed to improve alignment score • Gaps should not be included unless the alignments on either side score to make up for the gap penalty Alignment prefix suffix
Empty alignment Recurrence relation q1..i-1 h1..j qi - q1..i-1 h1..j-1 qi dj q1..i h1..j-1 - dj Effectively allows for removal of negatively contributing prefixes.
Initialization – Removing Initial Gaps • Initial gaps – in either sequence – should be ignored
The Best Local Alignment • Should ignore negatively contributing suffixes of alignments • Score of best local alignment – highest value in dynamic programming matrix • Alignment found by tracing back from maximum value until cell with value 0 (zero) has been reached
0 Best alignment Score of best alignment Calculating Best Local Alignment Use to fill rest row by row Use to fill first row Use to fill first column H matrix
Time Complexity • Sequences of lengths n and m • Two sequences of length l