Pairwise Sequence Alignment
This presentation is the property of its rightful owner.
Sponsored Links
1 / 50

Pairwise Sequence Alignment PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Pairwise Sequence Alignment. WHAT?. WHAT?. Given any two sequences (DNA or protein) Seq 1: CATATTGCAGTGGTCCCGCGTCAGGCT S eq 2: TAAATTGCGTGGTCGCACTGCACGCT we are interested to know to what extent they are similar?. CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT. WHY?.

Download Presentation

Pairwise Sequence Alignment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Pairwise sequence alignment

Pairwise Sequence Alignment


Pairwise sequence alignment

WHAT?


Pairwise sequence alignment

WHAT?

  • Given any two sequences (DNA or protein)

    Seq 1:

    CATATTGCAGTGGTCCCGCGTCAGGCT

    Seq 2:

    TAAATTGCGTGGTCGCACTGCACGCT

    we are interested to know to what extent they are similar?

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT


Pairwise sequence alignment

WHY?


Pairwise sequence alignment

  • Discover new function

  • Study evolution

  • Find crucial features within a sequence

  • Identify cause of diseases


Discover function

Discover function

  • Sequences that are similar probably have the same function


Study evolution

Study evolution

If two sequences from different organisms are similar , they may have a common ancestor


Find crucial features

Find crucial features

  • Regions in the sequences that are strongly conserved between different sequences can indicate their functional importance

Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT


Identify cause of disease

Identify cause of disease

  • Comparison of sequences between individuals can detect changes that are related to diseases


Sickle cell anemia

Sickle Cell Anemia

  • Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin

Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/


What makes sequences different

What makes sequences different?


Sequence modifications

Indel (replication slippage)

TCCGT

TCGAGT

TCAGT

TCGT

Sequence Modifications

  • Three types of changes

    • Substitution (point mutation)

    • Insertion

    • Deletion

TCAGT


How do we quantitate similarity

How do we quantitate similarity?


Scoring similarity

Scoring Similarity

  • Assume independent mutation model

    • Each site considered separately

  • Score at each site

    • Positive if the same

    • Negative if different

  • Sum to make final score

    • Can be positive or negative

    • Significance depends on sequence length

GTAGTCCTAGCG


Substitutions only not including indels

Total score +4

A weak match

Substitutions Onlynot including indels

  • Sequences compared base-by-base

  • Count the number of matches and mismatches

  • Matches score +2, Mismatches score -1

TTCGTCGTAGTCGGCTCGACCTGGTACGTCTAGCGAGCGTGATCCT

9 matches+18

14 mismatches-14


Including indels

Total score +24

A strong match

Including Indels

  • Create an ‘alignment’

    • Count matches within alignment

    • Required if sequences are different length

TT-CGTCGTAGTCG-GC-TCGACC-TGGTACGTC-TAG-CGAGCGT-GATCCT-

17 matches+34

2 mismatches- 2

8 indels- 8


Choosing an alignment

TT-CGTCGTAGTCG-GC-TCGACC-TGGTACGTC-TAG-CGAGCGT-GATCCT-

+24

-TTCGT-CGTAGTC-GGCTCG-ACCTGGTAC-GTCTA-GCGAGCGT-GATCC-T

0

Choosing an Alignment

  • Many different alignments are possible

    • Should consider all possible

    • Take the best score found

    • There may be more than one best alignment


Why is it hard

Why is it hard ?

Alignment (without gaps) requires an algorithm that performs a number of

comparisons roughly proportional to the square of the average sequence length.

If we include gaps the number of comparisons becomes astronomical


Algorithms for pairwise alignments

Algorithms for pairwise alignments

  • Dot Plots – Gibbs and McIntyre

  • Dynamic Programming :

    Local alignment : Smith- Waterman

    Global alignment :Needelman-Wunsch


Dot plots

Dot Plots

  • Early method

  • Sequences at top and left

  • Dots indicate matched bases

  • Diagonal series show matched regions

TAGTCG

TAG-CG


Dynamic programming

Dynamic Programming

  • A method for reducing a complex problem

  • to a set of identical sub-problems

  • The best solution to one sub-problem is independent from the best solution to the other sub-problem


Dynamic programming1

Dynamic Programming

  • A method for reducing a complex problem

  • to a set of identical sub-problems

  • The best solution to one sub-problem is independent from the best solution to the other sub-problem


What does it mean

what does it mean?

If a path from X→Z passes through Y, the best path from X→Y is independent of the best path from Y→Z


Example

Example

Sequences: A = ACGCTG, B = CATGT

A

C

G

C

T

G

1

2

3

4

5

6

C

1

A

2

T

3

G

4

T

Z

5


Example1

Score of best alignment between AC and CATG

…between ACG and CATG

-1

2

…between AC and CATGT

Calculate score between ACG and CATGT

-2

?

Example

Sequences: A = ACGCTG, B = CATGT

Match:+2, Other:-1


Needleman wunsch example

Needleman-Wunsch Example

Align the next

letter in the

sequences

Insertion in the

first sequence

3

5

-

5

Insertion in the

Second sequence

3

-


Needleman wunsch example1

-1 from before plus -1 for mismatch of G against T-2

2 from before plus -1 for mismatch of – against T1

-2 from before plus -1 for mismatch of G against –-3

Cell gets highest score of -2,1,-31

1

Needleman-Wunsch Example

-1

2

-2

Sequences: A = ACGCTG, B = CATGT


Needleman wunsch example2

Needleman-Wunsch Example

-1

2

-2

Sequences: A = ACGCTG, B = CATGT


Pairwise sequence alignment

A

-


Pairwise sequence alignment

ACGCTG

------


Pairwise sequence alignment

-----

CATGT


Pairwise sequence alignment

A

C


Pairwise sequence alignment

AC

-C


Pairwise sequence alignment

ACG

-C-


Pairwise sequence alignment

ACGC

---C

ACGC

-C--


Pairwise sequence alignment

ACG

-CA


Pairwise sequence alignment

ACGCTG-

-C-ATGT


Pairwise sequence alignment

ACGCTG-

-CA-TGT


Pairwise sequence alignment

-ACGCTG

CATG-T-


Needleman wunsch alignment

Summary

Needleman-Wunsch Alignment

  • Global alignment between sequences

    • Compare entire sequence against another

  • Create scoring table

    • Sequence A across top, B down left

  • Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B

    • Global alignment score is bottom right cell


Local alignment smith waterman

Local AlignmentSmith-Waterman

  • Best score for aligning part of sequences

    • Often beats global alignment score

Global Alignment

ATTGCAGTG-TCGAGCGTCAGGCT

ATTGCGTCGATCGCAC-GCACGCT

Local Alignment

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT


Global vs local alignment

Global vs. Local alignment

DOROTHY

DOROTHY

HODGKIN

HODGKIN

Global alignment:

DOROTHY--------HODGKIN

DOROTHYCROWFOOTHODGKIN

Local alignment:


Pairwise sequence alignment

Global vs. Local alignment

Alignment of two Genomic sequences

>Human DNA

CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

>Mouse DNA

CATGCGTCTGACgctttttgctagcgatatcggactATCGATATA


Pairwise sequence alignment

Global vs. Local alignment

Alignment of two Genomic sequences

Global Alignment

Human:CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

Mouse:CATGCGTCTGACgct---ttttgctagcgatatcggactATCGAT-ATA

****** ***** * *** * ****** ***

Human:CATGCGACTGAC

Mouse:CATGCGTCTGAC

Human:ATCGATCATA

Mouse:ATCGAT-ATA

Local Alignment


Pairwise sequence alignment

Global vs. Local alignment

Alignment of two Genomic DNA and mRNA

>Human DNA

CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

>Human mRNA

CATGCGACTGACATCGATCATA


Pairwise sequence alignment

Global vs. Local alignment

Alignment of two Genomic DNA and mRNA

Global Alignment

DNA: CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

mRNA:CATGCGACTGAC---------------------------ATCGATCATA

************ **********

DNA: CATGCGACTGAC

mRNA:CATGCGACTGAC

DNA: ATCGATCATA

mRNA:ATCGATCATA

Local Alignment


  • Login