Practical algorithms in Sequence Alignment. Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576/ [email protected] Sep 16 th , 2014. Key concepts. Assessing the significance of an alignment Extreme value distribution gives an analytical form to compute the significance of a score
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
query sequence: QLNFSAGW
word length w = 2 (default for protein usually w = 3)
word score threshold T = 9
Step 1: determine all words of length w in query sequence
QL LN NF FS SA AG GW
Step 2: Determine all words that score at least T when compared to a word in the query sequence
words with T≥9
Aminoacid substitution matrix
potentially in database
Query sequence: Q L N F S A
DB sequence: R L N Y S W
Score: 1 4 5 3 4 -3
Altshul et al 1997
+ Hits wit T>=13 (15 hits)
.Hits with T>=11 (22 hits)
Only these two are considered as they satisfy the two-hit method
Figure from Altshul et al 1997
Enter the query sequence
Choose the database
The sequence corresponds to the human HBB (hemoglobin) gene.
But we will select the mouse DB
Use Megablast (large word size)
Assesses significance of a score.
Related to P-value, but gives the
expected number of alignments of
this score value or higher.