Pairwise Sequence Alignment
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Pairwise Sequence Alignment Part 2 PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on
  • Presentation posted in: General

Pairwise Sequence Alignment Part 2. Outline. Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments. Global Alignment -Cont. Needleman-Wunsch Alignment. Global alignment between sequences Compare entire sequence against another

Download Presentation

Pairwise Sequence Alignment Part 2

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Pairwise sequence alignment part 2

Pairwise Sequence Alignment

Part 2


Outline

Outline

  • Global alignments-continuation

  • Local versus Global

  • BLAST algorithms

  • Evaluating significance of alignments


Global alignment cont

Global Alignment -Cont


Needleman wunsch alignment

Needleman-Wunsch Alignment

  • Global alignment between sequences

    • Compare entire sequence against another

  • Create scoring table

    • Sequence A across top, B down left

  • Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B

    • Global alignment score is bottom right cell


Pairwise sequence alignment part 2

A

-


Pairwise sequence alignment part 2

ACGCTG

------


Pairwise sequence alignment part 2

-----

CATGT


Pairwise sequence alignment part 2

A

C


Pairwise sequence alignment part 2

AC

-C


Pairwise sequence alignment part 2

ACG

-C-


Pairwise sequence alignment part 2

ACGC

---C

ACGC

-C--


Pairwise sequence alignment part 2

ACG

-CA


Pairwise sequence alignment part 2

ACGCTG-

-C-ATGT


Pairwise sequence alignment part 2

ACGCTG-

-CA-TGT


Pairwise sequence alignment part 2

-ACGCTG

CATG-T-


Global alignment versus local alignment

Global Alignment versus Local Alignment

Global Alignment

ATTGCAGTG-TCGAGCGTCAGGCT

ATTGCGTCGATCGCAC-GCACGCT

Local Alignment

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT


Global vs local alignment

Global vs. Local alignment

DOROTHY

DOROTHY

HODGKIN

HODGKIN

Global alignment:

DOROTHY--------HODGKIN

DOROTHYCROWFOOTHODGKIN

Local alignment:


Local alignment

Local Alignment

  • Best score for aligning part of sequences

    • Often beats global alignment score

  • Similar algorithm: Smith-Waterman

    • Table cells never score below zero


Pairwise sequence alignment part 2

TAA

TAA

TACTA

TAATA


Problems with dp for sequence alignments

Problems with DP for sequence alignments

-The complexity is very high

- Given a score, how to evaluate the significance of the alignment?


Complexity

Complexity

  • Complexity is determined by size of table

    • Aligning a sequence of lengthmagainst one of lengthnrequires calculating(mn)cells

  • Time of calculation

    Lets say we calculate 108 cells per second on a one processor PC

    • Aligning two mRNA sequences of8,000 bprequires64,000,000 cells 0.64 seconds

    • Aligning an mRNA and a107 bpchromosome requires~1011 cells 1,000 secs =15 minutes


Complexity for large databases

Complexity for large databases

  • Let’s say a database contains3  1010base pairs

    • Searching an mRNA against the database will require ~2.5  1014 cells 2.5  106 secs =1 month!

  • We need an efficient algorithm to cut down on alignment


Blast

BLAST

  • Basic Local Alignment Search Technique

  • A set of tools developed at NCBI (BlastN, BlastP,..)

  • BLAST benefits

    • Search speed

    • Ease of use

    • Statistical rigor


Blast1

BLAST

  • A good alignment contains subsequences of absolute identity:

    • First, identify very short (almost) exact matches.

    • Next, the best short hits from the 1st step are extended to longer regions of similarity.

    • Finally, the best hits are optimized using the Smith-Waterman algorithm.


Blast algorithm

BLAST Algorithm

(1)

Query sequence

Words of length W

W default = 11

  • Compare the word list to the database

  • and identify exact matches


Pairwise sequence alignment part 2

  • For each word match, extend alignment in both

  • directions

(4) Score the alignments using Dynamic Programing

(5) Evaluate the statistics significance


Database searches

Random

Related

Database Searches

  • Using the pairwise comparison, each database search normally yields 2 groups of scores: genuinely related and unrelated sequences, with some overlap between them.

  • A good search method should completely separate between the 2 score groups.


E value

E-value

  • The number of hits (with the same similarity score) one can "expect" to see just by chance when searching the given string in a database of a particular size.

  • higher e-value lower similarity

    • “sequences with E-value of less than 0.01 are almost always found to be homologous”

  • The lower bound is normally 0 (we want to find the best)


Expectation values

Expectation Values

Increases linearly with length of query sequence

Decreases exponentially with score of alignment

Increases linearly with length of database


  • Login