Sequence comparisons
Download
1 / 24

Sequence comparisons - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Sequence comparisons. April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology. Understand difference between global alignment and local alignment. Workshop-Perform sliding window to compare two sequences

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Sequence comparisons' - titus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Sequence comparisons
Sequence comparisons

  • April 9, 2002

  • Review homework

  • Learning objectives-Review amino acids. Understand difference between identity, similarity and homology. Understand difference between global alignment and local alignment.

  • Workshop-Perform sliding window to compare two sequences

  • Homework #3 due on Thurs.



Review of amino acid characteristics
Review of amino acid characteristics

http://info.bio.cmu.edu/Courses/BiochemMols/AAViewer/AAVFrameset.htm

http://info.bio.cmu.edu/Courses/BiochemMols/BCMolecules.html


Purpose of finding differences and similarities of amino acids
Purpose of finding differences and similarities of amino acids.

  • Infer structural information

  • Infer functional information

  • Infer evolutionary relationships


Evolutionary basis of sequence alignment
Evolutionary Basis of Sequence Alignment acids.

  • Similarity: Quantity that relates how much

  • two amino acid sequences are alike.

  • 2. Identity: Quantity that describes how much

  • two sequences are alike in the strictest terms.

  • 3. Homology: a conclusion drawn from data

  • suggesting that two genes share a common

  • evolutionary history.


Evolutionary basis of sequence alignment cont 1
Evolutionary Basis of Sequence Alignment (Cont. 1) acids.

1. Example: Shown on the next page is a pairwise alignment of two proteins. One is mouse trypsin and the other is crayfish trypsin. They are homologous proteins. The sequences share 41% identity.

2. Underlined residues are identical. Asterisks and diamond represent those residues that participate in catalysis. Five gaps are placed to optimize the alignment.


Evolutionary basis of sequence alignment cont 2
Evolutionary Basis of Sequence Alignment (Cont. 2) acids.

Why are there regions of identity?

1) Conserved function-residues participate in reaction.

2) Structural (For example, conserved cysteine residues that

form a disulfide linkage)

3) Historical-Residues that are conserved solely due to a common ancestor gene.


Evolutionary basis of sequence alignment cont 3
Evolutionary Basis of Sequence Alignment (Cont. 3) acids.

Note: it is possible that two proteins share a high degree of

similarity but have two different functions. For example,

human gamma-crystallin is a lens protein that has no known

enzymatic activity. It shares a high percentage of identity with

E. coli quinone oxidoreductase. These proteins likely had a

common ancestor but their functions diverged.

Analogous to railroad car and diner function.


Modular nature of proteins
Modular nature of proteins acids.

  • The previous alignment was global. However, many proteins do not display global patterns of similarity. Instead, they possess local regions of similarity.

  • Proteins can be thought of as assemblies of modular domains. It is thought that this may, in some cases, be due to a process known as exon shuffling.


Modular nature of proteins cont 1
Modular nature of proteins (cont. 1) acids.

Exon 1a

Exon 2a

Gene A

Duplication of Exon 2a

Exon 1a

Gene A

Exon 2a

Exon 2a

Exchange with Gene B

Exon 1b

Gene B

Exon 2b

Exon 2b

Exon 3 (Exon 2b from Gene B)

Exon 2a

Exon 1a

Gene A

Exon 1b

Exon 3 (Exon 2a from Gene A)

Gene B

Exon 2b


Dot plots

A T G C C T A G acids.

*

*

A T G C C T A G

*

*

*

*

*

*

*

*

*

*

*

*

*

*

Dot Plots

Window = 1

Note that 25% of

the table will be

filled due to random

chance. 1 in 4 chance

at each position


Dot plots with window 2
Dot Plots with window = 2 acids.

A T G C C T A G

Window = 2

The larger the window

the more noise can

be filtered

What is the

percent chance that

you will receive a

match randomly?

1/16 * 100 = 6.25%

*

A T G C C T A G

{

*

{

*

{

*

{

*

{

*

{

*

{


Similarity
Similarity acids.

It is easy to score if an amino acid is identical to another (the

score is 1 if identical and 0 if not). However, it is not easy to

give a score for amino acids that are somewhat similar.

CO2-

CO2-

+NH3

+NH3

Isoleucine

Leucine

Should they get a 0 (non-identical) or a 1 (identical) or

Something in between?


Identity matrix
Identity Matrix acids.

A

1

C

0

1

I

0

0

1

L

0

0

0

1

A

C

I

L

Simplest type of scoring matrix


The point accepted mutation pam model of evolution and the pam scoring matrix
The Point-Accepted-Mutation (PAM) model of evolution and the PAM scoring matrix

It implies that each amino acid (AA) mutates independently of

each other with a probability which depends only on the AA.

Since there are 20 AA, the transition probabilities are

described by a 20X20-mutation matrix, denoted by M.

A standard M, which defines a 1-PAM change.

Point Accepted Mutation (PAM) Distance: A 1-PAM unit changes 1%

of the amino acids on average:

where fi is the frequency of AA i. One PAM is a unit of evolutionary

divergence in which 1% of the amino acids have been changed.


The point accepted mutation pam model of evolution and the pam scoring matrix cont 1
The Point-Accepted-Mutation (PAM) model of evolution and the PAM scoring matrix (cont. 1)

A 2-PAM unit is equivalent to two 1-PAM unit evolution (or M2).

A k-PAM unit is equivalent to k 1-PAM unit evolution (or Mk). Example 1:

CNGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQV

|||||||||||||| |||||||||||||||||||||||||||||||||||

CNGTTDQVDKIVKIRNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQV

lengths = 50

1 Mismatch

PAM distance = 2


Two proteins that are similar in certain regions
Two proteins that are similar in certain regions PAM scoring matrix (cont. 1)

Tissue plasminogen activator (PLAT)

Coagulation factor 12 (F12).


The dotter program
The Dotter Program PAM scoring matrix (cont. 1)

  • Program consists of three components:

    • Sliding window

    • A table that gives a score for each amino acid match

    • A graph that converts the score to a dot of certain density.

    • The higher the density the higher the score.


Sequence comparisons

Single region on F12 PAM scoring matrix (cont. 1)

is similar to two regions

on PLAT

Region of

similarity