slide1
Download
Skip this Video
Download Presentation
Pairwise Sequence Alignment Part 2

Loading in 2 Seconds...

play fullscreen
1 / 33

Pairwise Sequence Alignment Part 2 - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Pairwise Sequence Alignment Part 2. Outline. Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments. Global Alignment -Cont. Needleman-Wunsch Alignment. Global alignment between sequences Compare entire sequence against another

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Pairwise Sequence Alignment Part 2' - darius


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Global alignments-continuation
  • Local versus Global
  • BLAST algorithms
  • Evaluating significance of alignments
needleman wunsch alignment
Needleman-Wunsch Alignment
  • Global alignment between sequences
    • Compare entire sequence against another
  • Create scoring table
    • Sequence A across top, B down left
  • Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B
    • Global alignment score is bottom right cell
slide6

A

-

slide7

ACGCTG

------

slide8

-----

CATGT

slide9

A

C

slide10

AC

-C

slide11

ACG

-C-

slide12

ACGC

---C

ACGC

-C--

slide13

ACG

-CA

slide17

ACGCTG-

-C-ATGT

slide18

ACGCTG-

-CA-TGT

slide19

-ACGCTG

CATG-T-

global alignment versus local alignment
Global Alignment versus Local Alignment

Global Alignment

ATTGCAGTG-TCGAGCGTCAGGCT

ATTGCGTCGATCGCAC-GCACGCT

Local Alignment

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT

global vs local alignment
Global vs. Local alignment

DOROTHY

DOROTHY

HODGKIN

HODGKIN

Global alignment:

DOROTHY--------HODGKIN

DOROTHYCROWFOOTHODGKIN

Local alignment:

local alignment
Local Alignment
  • Best score for aligning part of sequences
    • Often beats global alignment score
  • Similar algorithm: Smith-Waterman
    • Table cells never score below zero
slide23

TAA

TAA

TACTA

TAATA

problems with dp for sequence alignments
Problems with DP for sequence alignments

-The complexity is very high

- Given a score, how to evaluate the significance of the alignment?

complexity
Complexity
  • Complexity is determined by size of table
    • Aligning a sequence of lengthmagainst one of lengthnrequires calculating(mn)cells
  • Time of calculation

Lets say we calculate 108 cells per second on a one processor PC

    • Aligning two mRNA sequences of8,000 bprequires64,000,000 cells 0.64 seconds
    • Aligning an mRNA and a107 bpchromosome requires~1011 cells 1,000 secs =15 minutes
complexity for large databases
Complexity for large databases
  • Let’s say a database contains3  1010base pairs
    • Searching an mRNA against the database will require ~2.5  1014 cells 2.5  106 secs =1 month!
  • We need an efficient algorithm to cut down on alignment
blast
BLAST
  • Basic Local Alignment Search Technique
  • A set of tools developed at NCBI (BlastN, BlastP,..)
  • BLAST benefits
    • Search speed
    • Ease of use
    • Statistical rigor
blast1
BLAST
  • A good alignment contains subsequences of absolute identity:
    • First, identify very short (almost) exact matches.
    • Next, the best short hits from the 1st step are extended to longer regions of similarity.
    • Finally, the best hits are optimized using the Smith-Waterman algorithm.
blast algorithm
BLAST Algorithm

(1)

Query sequence

Words of length W

W default = 11

  • Compare the word list to the database
  • and identify exact matches
slide30

For each word match, extend alignment in both

  • directions

(4) Score the alignments using Dynamic Programing

(5) Evaluate the statistics significance

database searches

Random

Related

Database Searches
  • Using the pairwise comparison, each database search normally yields 2 groups of scores: genuinely related and unrelated sequences, with some overlap between them.
  • A good search method should completely separate between the 2 score groups.
e value
E-value
  • The number of hits (with the same similarity score) one can "expect" to see just by chance when searching the given string in a database of a particular size.
  • higher e-value lower similarity
    • “sequences with E-value of less than 0.01 are almost always found to be homologous”
  • The lower bound is normally 0 (we want to find the best)
expectation values
Expectation Values

Increases linearly with length of query sequence

Decreases exponentially with score of alignment

Increases linearly with length of database

ad