Evolution and scoring rules
Download
1 / 68

Evolution and Scoring Rules - PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on

Evolution and Scoring Rules. Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-5) x (# gap openings) + (-2) x (total length of all gaps). Scoring Matrices.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Evolution and Scoring Rules' - chinue


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Evolution and scoring rules
Evolution and Scoring Rules

  • Example Score =

    5 x (# matches) + (-4) x (# mismatches) +

    + (-7) x (total length of all gaps)

  • Example Score =

    5 x (# matches) + (-4) x (# mismatches) +

    + (-5) x (# gap openings) + (-2) x (total length of all gaps)



Scoring rules vs scoring matrices
Scoring Rules vs. Scoring Matrices

  • Nucleotide vs. Amino Acid Sequence

  • The choice of a scoring rule can strongly influence the outcome of sequence analysis

  • Scoring matrices implicitly represent a particular theory of evolution

  • Elements of the matrices specify the similarity of one residue to another


Translation - Protein Synthesis:Every 3 nucleotides (codon) are translated into one amino acid

DNA: A T G C

1:1

RNA: A U G C

3:1

Protein: 20 amino acids

Replication

Transcription

Translation



Translation - Protein Synthesis

RNA Protein

5’ -> 3’ : N-term -> C-term


Log Likelihoods used as Scoring Matrices:PAM - % Accepted Mutations:1500 changes in 71 groups w/ > 85% similarityBLOSUM – Blocks Substitution Matrix:2000 “blocks” from 500 families



Likelihood ratio for aligning a single pair of residues
Likelihood Ratio for Aligning a Single Pair of Residues

  • Above: the probability that two residues are aligned by evolutionary descent

  • Below: the probability that they are aligned by chance

  • Pi, Pj are frequencies of residue i and j in all protein sequences (abundance)





Blosum matrices for protein
BLOSUM matrices for Protein

  • S. Henikoff and J. Henikoff (1992). “Amino acid substitution matrices from protein blocks”. PNAS 89: 10915-10919

  • Training Data: ~2000 conserved blocks from BLOCKS database. Ungapped, aligned protein segments. Each block represents a conserved region of a protein family


Constructing blosum matrices of specific similarities
Constructing BLOSUM Matrices of Specific Similarities

  • Sets of sequences have widely varying similarity. Sequences with above a threshold similarity are clustered.

  • If clustering threshold is 62%, final matrix is BLOSUM62



Constructing a blosum matr 1 counting mutations
Constructing a BLOSUM matr. training sequences1. Counting mutations


Constructing a blosum matr 2 tallying mutation frequencies
Constructing a BLOSUM matr. training sequences2. Tallying mutation frequencies


Constructing a blosum matr 3 matrix of mutation probs
Constructing a BLOSUM matr. training sequences3. Matrix of mutation probs.



5 obtaining a blosum matrix
5. Obtaining a BLOSUM matrix training sequences



1 2 3 mutation frequency table
1.2.3.Mutation Frequency Table training sequences



5 obtaining blosum62 matrix
5. Obtaining BLOSUM62 Matrix training sequences


Pam matrices point accepted mutations

PAM Matrices training sequences(Point Accepted Mutations)

Mutations accepted by natural selection


Pam matrices
PAM Matrices training sequences

  • Accepted Point Mutation

  • Atlas of Protein Sequence and Structure,

    Suppl 3, 1978, M.O. Dayhoff.

    ed. National Biomedical Research Foundation, 1

  • Based on evolutionary principles



Pam phylogenetic tree
PAM: Phylogenetic Tree training sequences


Pam accepted point mutation
PAM: Accepted Point Mutation training sequences


Mutability
Mutability training sequences


Total Mutation Rate training sequences

is the total mutation rate of all amino acids


Normalize total mutation rate
Normalize Total Mutation Rate training sequences


Mutation Probability Matrix Normalized training sequences

Such that the

Total Mutation Rate is 1%



-- PAM1 mutation prob. matr. --PAM2 Mutation Probability Matrix?

-- Mutations that happen in twice the evolution period of that for a PAM1


Pam matrix assumptions
PAM Matrix: Assumptions Probability Matrix?


In two pam1 periods
In two PAM1 periods: Probability Matrix?

  • {AR} = {AA and AR} or

    {AN and NR} or

    {AD and DR} or

    … or

    {AV and VR}



Pam k mutation prob matrix
PAM-k Mutation Prob. Matrix Probability Matrix?


Pam 1 log likelihood matrix
PAM-1 log likelihood matrix Probability Matrix?


Pam k log likelihood matrix
PAM-k log likelihood matrix Probability Matrix?


PAM-250 Probability Matrix?


  • PAM60—60%, PAM80—50%, Probability Matrix?

  • PAM120—40%

  • PAM-250 matrix provides a better scoring alignment than lower-numbered PAM matrices for proteins of 14-27% similarity


Sources of error in pam
Sources of Error in PAM Probability Matrix?


Comparing scoring matrix

PAM Probability Matrix?

Based on extrapolation of a small evol. Period

Track evolutionary origins

Homologous seq.s during evolution

BLOSUM

Based on a range of evol. Periods

Conserved blocks

Find conserved domains

Comparing Scoring Matrix


Choice of Scoring Matrix Probability Matrix?


Global alignment with affine gaps

Global Alignment with Probability Matrix? Affine Gaps

Complex Dynamic Programming


Problem w independent gap penalties
Problem w/ Independent Gap Penalties Probability Matrix?

  • The occurrence of x consecutive deletions/insertions is more likely than the occurrence of x isolated mutations

  • We should penalize x long gap less than x

    times of the penalty for one gap


Affine gap penalty
Affine Gap Penalty Probability Matrix?

  • w2 is the penalty for each gap

  • w1 is the _extra_ penalty for the 1st gap


Scoring rule not additive
Scoring Rule not Additive! Probability Matrix?

  • We need to know if the current gap is a new gap or the continuation of an existing gap

  • Use three Dynamic Programming matrices to keep track of the previous step


S1 is the vertical sequence Probability Matrix?

S2 is the horizontal sequence

  • (From Diagonal) a(i,j): current position is a match

  • (From Left) b(i,j): current position is a gap in S1

  • (From Above) c(i,j): current position is a gap in S2

    Filling the next element in each matrix depends on the previous step, which is stored in the three matrices.


Last step a match Probability Matrix?

a gap in S2

a gap in S1

new gap in S2

a continued gap in S2

a gap in S2 following

a gap in S1


Decisions in seq alignment
Decisions in Seq. Alignment Probability Matrix?

  • Local or global alignment?

  • Which program to use

  • Type of scoring matrix

  • Value of gap penalty


A ij 10
A Probability Matrix?ij*10


PAM-k log-likelihood matrix Probability Matrix?


ad