Pairwise Sequence Alignment
This presentation is the property of its rightful owner.
Sponsored Links
1 / 50

Pairwise Sequence Alignment PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

Pairwise Sequence Alignment. WHAT?. WHAT?. Given any two sequences (DNA or protein) Seq 1: CATATTGCAGTGGTCCCGCGTCAGGCT S eq 2: TAAATTGCGTGGTCGCACTGCACGCT we are interested to know to what extent they are similar?. CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT. WHY?.

Download Presentation

Pairwise Sequence Alignment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Pairwise Sequence Alignment


WHAT?


WHAT?

  • Given any two sequences (DNA or protein)

    Seq 1:

    CATATTGCAGTGGTCCCGCGTCAGGCT

    Seq 2:

    TAAATTGCGTGGTCGCACTGCACGCT

    we are interested to know to what extent they are similar?

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT


WHY?


  • Discover new function

  • Study evolution

  • Find crucial features within a sequence

  • Identify cause of diseases


Discover function

  • Sequences that are similar probably have the same function


Study evolution

If two sequences from different organisms are similar , they may have a common ancestor


Find crucial features

  • Regions in the sequences that are strongly conserved between different sequences can indicate their functional importance

Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT


Identify cause of disease

  • Comparison of sequences between individuals can detect changes that are related to diseases


Sickle Cell Anemia

  • Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin

Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/


What makes sequences different?


Indel (replication slippage)

TCCGT

TCGAGT

TCAGT

TCGT

Sequence Modifications

  • Three types of changes

    • Substitution (point mutation)

    • Insertion

    • Deletion

TCAGT


How do we quantitate similarity?


Scoring Similarity

  • Assume independent mutation model

    • Each site considered separately

  • Score at each site

    • Positive if the same

    • Negative if different

  • Sum to make final score

    • Can be positive or negative

    • Significance depends on sequence length

GTAGTCCTAGCG


Total score +4

A weak match

Substitutions Onlynot including indels

  • Sequences compared base-by-base

  • Count the number of matches and mismatches

  • Matches score +2, Mismatches score -1

TTCGTCGTAGTCGGCTCGACCTGGTACGTCTAGCGAGCGTGATCCT

9 matches+18

14 mismatches-14


Total score +24

A strong match

Including Indels

  • Create an ‘alignment’

    • Count matches within alignment

    • Required if sequences are different length

TT-CGTCGTAGTCG-GC-TCGACC-TGGTACGTC-TAG-CGAGCGT-GATCCT-

17 matches+34

2 mismatches- 2

8 indels- 8


TT-CGTCGTAGTCG-GC-TCGACC-TGGTACGTC-TAG-CGAGCGT-GATCCT-

+24

-TTCGT-CGTAGTC-GGCTCG-ACCTGGTAC-GTCTA-GCGAGCGT-GATCC-T

0

Choosing an Alignment

  • Many different alignments are possible

    • Should consider all possible

    • Take the best score found

    • There may be more than one best alignment


Why is it hard ?

Alignment (without gaps) requires an algorithm that performs a number of

comparisons roughly proportional to the square of the average sequence length.

If we include gaps the number of comparisons becomes astronomical


Algorithms for pairwise alignments

  • Dot Plots – Gibbs and McIntyre

  • Dynamic Programming :

    Local alignment : Smith- Waterman

    Global alignment :Needelman-Wunsch


Dot Plots

  • Early method

  • Sequences at top and left

  • Dots indicate matched bases

  • Diagonal series show matched regions

TAGTCG

TAG-CG


Dynamic Programming

  • A method for reducing a complex problem

  • to a set of identical sub-problems

  • The best solution to one sub-problem is independent from the best solution to the other sub-problem


Dynamic Programming

  • A method for reducing a complex problem

  • to a set of identical sub-problems

  • The best solution to one sub-problem is independent from the best solution to the other sub-problem


what does it mean?

If a path from X→Z passes through Y, the best path from X→Y is independent of the best path from Y→Z


Example

Sequences: A = ACGCTG, B = CATGT

A

C

G

C

T

G

1

2

3

4

5

6

C

1

A

2

T

3

G

4

T

Z

5


Score of best alignment between AC and CATG

…between ACG and CATG

-1

2

…between AC and CATGT

Calculate score between ACG and CATGT

-2

?

Example

Sequences: A = ACGCTG, B = CATGT

Match:+2, Other:-1


Needleman-Wunsch Example

Align the next

letter in the

sequences

Insertion in the

first sequence

3

5

-

5

Insertion in the

Second sequence

3

-


-1 from before plus -1 for mismatch of G against T-2

2 from before plus -1 for mismatch of – against T1

-2 from before plus -1 for mismatch of G against –-3

Cell gets highest score of -2,1,-31

1

Needleman-Wunsch Example

-1

2

-2

Sequences: A = ACGCTG, B = CATGT


Needleman-Wunsch Example

-1

2

-2

Sequences: A = ACGCTG, B = CATGT


A

-


ACGCTG

------


-----

CATGT


A

C


AC

-C


ACG

-C-


ACGC

---C

ACGC

-C--


ACG

-CA


ACGCTG-

-C-ATGT


ACGCTG-

-CA-TGT


-ACGCTG

CATG-T-


Summary

Needleman-Wunsch Alignment

  • Global alignment between sequences

    • Compare entire sequence against another

  • Create scoring table

    • Sequence A across top, B down left

  • Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B

    • Global alignment score is bottom right cell


Local AlignmentSmith-Waterman

  • Best score for aligning part of sequences

    • Often beats global alignment score

Global Alignment

ATTGCAGTG-TCGAGCGTCAGGCT

ATTGCGTCGATCGCAC-GCACGCT

Local Alignment

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT


Global vs. Local alignment

DOROTHY

DOROTHY

HODGKIN

HODGKIN

Global alignment:

DOROTHY--------HODGKIN

DOROTHYCROWFOOTHODGKIN

Local alignment:


Global vs. Local alignment

Alignment of two Genomic sequences

>Human DNA

CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

>Mouse DNA

CATGCGTCTGACgctttttgctagcgatatcggactATCGATATA


Global vs. Local alignment

Alignment of two Genomic sequences

Global Alignment

Human:CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

Mouse:CATGCGTCTGACgct---ttttgctagcgatatcggactATCGAT-ATA

****** ***** * *** * ****** ***

Human:CATGCGACTGAC

Mouse:CATGCGTCTGAC

Human:ATCGATCATA

Mouse:ATCGAT-ATA

Local Alignment


Global vs. Local alignment

Alignment of two Genomic DNA and mRNA

>Human DNA

CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

>Human mRNA

CATGCGACTGACATCGATCATA


Global vs. Local alignment

Alignment of two Genomic DNA and mRNA

Global Alignment

DNA: CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

mRNA:CATGCGACTGAC---------------------------ATCGATCATA

************ **********

DNA: CATGCGACTGAC

mRNA:CATGCGACTGAC

DNA: ATCGATCATA

mRNA:ATCGATCATA

Local Alignment


  • Login