Pairwise Sequence Alignment Part 2

Pairwise Sequence Alignment Part 2

Outline • Global alignments-continuation • Local versus Global • BLAST algorithms • Evaluating significance of alignments

Global Alignment -Cont

Needleman-Wunsch Alignment • Global alignment between sequences • Compare entire sequence against another • Create scoring table • Sequence A across top, B down left • Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B • Global alignment score is bottom right cell

ACGCTG ------

----- CATGT

ACG -C-

ACGC ---C ACGC -C--

ACG -CA

ACGCTG- -C-ATGT

ACGCTG- -CA-TGT

-ACGCTG CATG-T-

Global Alignment versus Local Alignment Global Alignment ATTGCAGTG-TCGAGCGTCAGGCT ATTGCGTCGATCGCAC-GCACGCT Local Alignment CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

Global vs. Local alignment DOROTHY DOROTHY HODGKIN HODGKIN Global alignment: DOROTHY--------HODGKIN DOROTHYCROWFOOTHODGKIN Local alignment:

Local Alignment • Best score for aligning part of sequences • Often beats global alignment score • Similar algorithm: Smith-Waterman • Table cells never score below zero

TAA TAA TACTA TAATA

Problems with DP for sequence alignments -The complexity is very high - Given a score, how to evaluate the significance of the alignment?

Complexity • Complexity is determined by size of table • Aligning a sequence of lengthmagainst one of lengthnrequires calculating(mn)cells • Time of calculation Lets say we calculate 108 cells per second on a one processor PC • Aligning two mRNA sequences of8,000 bprequires64,000,000 cells 0.64 seconds • Aligning an mRNA and a107 bpchromosome requires~1011 cells 1,000 secs =15 minutes

Complexity for large databases • Let’s say a database contains3  1010base pairs • Searching an mRNA against the database will require ~2.5  1014 cells 2.5  106 secs =1 month! • We need an efficient algorithm to cut down on alignment

BLAST • Basic Local Alignment Search Technique • A set of tools developed at NCBI (BlastN, BlastP,..) • BLAST benefits • Search speed • Ease of use • Statistical rigor

BLAST • A good alignment contains subsequences of absolute identity: • First, identify very short (almost) exact matches. • Next, the best short hits from the 1st step are extended to longer regions of similarity. • Finally, the best hits are optimized using the Smith-Waterman algorithm.

BLAST Algorithm (1) Query sequence Words of length W W default = 11 • Compare the word list to the database • and identify exact matches

For each word match, extend alignment in both • directions (4) Score the alignments using Dynamic Programing (5) Evaluate the statistics significance

Random Related Database Searches • Using the pairwise comparison, each database search normally yields 2 groups of scores: genuinely related and unrelated sequences, with some overlap between them. • A good search method should completely separate between the 2 score groups.

E-value • The number of hits (with the same similarity score) one can "expect" to see just by chance when searching the given string in a database of a particular size. • higher e-value lower similarity • “sequences with E-value of less than 0.01 are almost always found to be homologous” • The lower bound is normally 0 (we want to find the best)

Expectation Values Increases linearly with length of query sequence Decreases exponentially with score of alignment Increases linearly with length of database

Pairwise Sequence Alignment Part 2

Pairwise Sequence Alignment Part 2

Presentation Transcript

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignment (I)

Pairwise sequence Alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise sequence Alignment

Pairwise sequence alignment

Pairwise Sequence Alignment

Pairwise sequence alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignment Exercise 2

Lecture 2 Pairwise Sequence Alignment

Pairwise sequence Alignment

Pairwise Sequence Alignment (II)

Pairwise Sequence Alignment

Pairwise Sequence Alignment (cont.)

Pairwise Sequence Alignment

Pairwise sequence alignment

Pairwise sequence alignment (practice)

Pairwise Sequence Alignment (II)