slide1
Download
Skip this Video
Download Presentation
Pairwise Sequence Alignment

Loading in 2 Seconds...

play fullscreen
1 / 50

Pairwise Sequence Alignment - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Pairwise Sequence Alignment. WHAT?. WHAT?. Given any two sequences (DNA or protein) Seq 1: CATATTGCAGTGGTCCCGCGTCAGGCT S eq 2: TAAATTGCGTGGTCGCACTGCACGCT we are interested to know to what extent they are similar?. CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT. WHY?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Pairwise Sequence Alignment' - malise


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide3
WHAT?
  • Given any two sequences (DNA or protein)

Seq 1:

CATATTGCAGTGGTCCCGCGTCAGGCT

Seq 2:

TAAATTGCGTGGTCGCACTGCACGCT

we are interested to know to what extent they are similar?

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT

slide5
Discover new function
  • Study evolution
  • Find crucial features within a sequence
  • Identify cause of diseases
discover function
Discover function
  • Sequences that are similar probably have the same function
study evolution
Study evolution

If two sequences from different organisms are similar , they may have a common ancestor

find crucial features
Find crucial features
  • Regions in the sequences that are strongly conserved between different sequences can indicate their functional importance

Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT

identify cause of disease
Identify cause of disease
  • Comparison of sequences between individuals can detect changes that are related to diseases
sickle cell anemia
Sickle Cell Anemia
  • Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin

Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

sequence modifications
Indel (replication slippage)

TCCGT

TCGAGT

TCAGT

TCGT

Sequence Modifications
  • Three types of changes
    • Substitution (point mutation)
    • Insertion
    • Deletion

TCAGT

scoring similarity
Scoring Similarity
  • Assume independent mutation model
    • Each site considered separately
  • Score at each site
    • Positive if the same
    • Negative if different
  • Sum to make final score
    • Can be positive or negative
    • Significance depends on sequence length

GTAGTCCTAGCG

substitutions only not including indels
Total score +4

A weak match

Substitutions Onlynot including indels
  • Sequences compared base-by-base
  • Count the number of matches and mismatches
  • Matches score +2, Mismatches score -1

TTCGTCGTAGTCGGCTCGACCTGGTACGTCTAGCGAGCGTGATCCT

9 matches +18

14 mismatches -14

including indels
Total score +24

A strong match

Including Indels
  • Create an ‘alignment’
    • Count matches within alignment
    • Required if sequences are different length

TT-CGTCGTAGTCG-GC-TCGACC-TGGTACGTC-TAG-CGAGCGT-GATCCT-

17 matches +34

2 mismatches - 2

8 indels - 8

choosing an alignment
TT-CGTCGTAGTCG-GC-TCGACC-TGGTACGTC-TAG-CGAGCGT-GATCCT-

+24

-TTCGT-CGTAGTC-GGCTCG-ACCTGGTAC-GTCTA-GCGAGCGT-GATCC-T

0

Choosing an Alignment
  • Many different alignments are possible
    • Should consider all possible
    • Take the best score found
    • There may be more than one best alignment
why is it hard
Why is it hard ?

Alignment (without gaps) requires an algorithm that performs a number of

comparisons roughly proportional to the square of the average sequence length.

If we include gaps the number of comparisons becomes astronomical

algorithms for pairwise alignments
Algorithms for pairwise alignments
  • Dot Plots – Gibbs and McIntyre
  • Dynamic Programming :

Local alignment : Smith- Waterman

Global alignment :Needelman-Wunsch

dot plots
Dot Plots
  • Early method
  • Sequences at top and left
  • Dots indicate matched bases
  • Diagonal series show matched regions

TAGTCG

TAG-CG

dynamic programming
Dynamic Programming
  • A method for reducing a complex problem
  • to a set of identical sub-problems
  • The best solution to one sub-problem is independent from the best solution to the other sub-problem
dynamic programming1
Dynamic Programming
  • A method for reducing a complex problem
  • to a set of identical sub-problems
  • The best solution to one sub-problem is independent from the best solution to the other sub-problem
what does it mean
what does it mean?

If a path from X→Z passes through Y, the best path from X→Y is independent of the best path from Y→Z

example
Example

Sequences: A = ACGCTG, B = CATGT

A

C

G

C

T

G

1

2

3

4

5

6

C

1

A

2

T

3

G

4

T

Z

5

example1
Score of best alignment between AC and CATG

…between ACG and CATG

-1

2

…between AC and CATGT

Calculate score between ACG and CATGT

-2

?

Example

Sequences: A = ACGCTG, B = CATGT

Match:+2, Other:-1

needleman wunsch example
Needleman-Wunsch Example

Align the next

letter in the

sequences

Insertion in the

first sequence

3

5

-

5

Insertion in the

Second sequence

3

-

needleman wunsch example1
-1 from before plus -1 for mismatch of G against T-2

2 from before plus -1 for mismatch of – against T1

-2 from before plus -1 for mismatch of G against –-3

Cell gets highest score of -2,1,-31

1

Needleman-Wunsch Example

-1

2

-2

Sequences: A = ACGCTG, B = CATGT

needleman wunsch example2
Needleman-Wunsch Example

-1

2

-2

Sequences: A = ACGCTG, B = CATGT

slide30
A

-

slide31
ACGCTG

------

slide32
-----

CATGT

slide33
A

C

slide34
AC

-C

slide35
ACG

-C-

slide36
ACGC

---C

ACGC

-C--

slide37
ACG

-CA

slide41
ACGCTG-

-C-ATGT

slide42
ACGCTG-

-CA-TGT

slide43
-ACGCTG

CATG-T-

needleman wunsch alignment
SummaryNeedleman-Wunsch Alignment
  • Global alignment between sequences
    • Compare entire sequence against another
  • Create scoring table
    • Sequence A across top, B down left
  • Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B
    • Global alignment score is bottom right cell
local alignment smith waterman
Local AlignmentSmith-Waterman
  • Best score for aligning part of sequences
    • Often beats global alignment score

Global Alignment

ATTGCAGTG-TCGAGCGTCAGGCT

ATTGCGTCGATCGCAC-GCACGCT

Local Alignment

CATATTGCAGTGGTCCCGCGTCAGGCT

TAAATTGCGT-GGTCGCACTGCACGCT

global vs local alignment
Global vs. Local alignment

DOROTHY

DOROTHY

HODGKIN

HODGKIN

Global alignment:

DOROTHY--------HODGKIN

DOROTHYCROWFOOTHODGKIN

Local alignment:

slide47
Global vs. Local alignment

Alignment of two Genomic sequences

>Human DNA

CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

>Mouse DNA

CATGCGTCTGACgctttttgctagcgatatcggactATCGATATA

slide48
Global vs. Local alignment

Alignment of two Genomic sequences

Global Alignment

Human:CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

Mouse:CATGCGTCTGACgct---ttttgctagcgatatcggactATCGAT-ATA

****** ***** * *** * ****** ***

Human:CATGCGACTGAC

Mouse:CATGCGTCTGAC

Human:ATCGATCATA

Mouse:ATCGAT-ATA

Local Alignment

slide49
Global vs. Local alignment

Alignment of two Genomic DNA and mRNA

>Human DNA

CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

>Human mRNA

CATGCGACTGACATCGATCATA

slide50
Global vs. Local alignment

Alignment of two Genomic DNA and mRNA

Global Alignment

DNA: CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA

mRNA:CATGCGACTGAC---------------------------ATCGATCATA

************ **********

DNA: CATGCGACTGAC

mRNA:CATGCGACTGAC

DNA: ATCGATCATA

mRNA:ATCGATCATA

Local Alignment

ad