Scoring matrices
1 / 47

Scoring Matrices - PowerPoint PPT Presentation

  • Uploaded on

Scoring Matrices. Diff. Scoring Rules Lead to Diff. Alignments. Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-5) x (# gap openings) + (-2) x (total length of all gaps).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Scoring Matrices' - Jimmy

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Diff scoring rules lead to diff alignments l.jpg
Diff. Scoring Rules Lead to Diff. Alignments

  • Example Score =

    5 x (# matches) + (-4) x (# mismatches) +

    + (-7) x (total length of all gaps)

  • Example Score =

    5 x (# matches) + (-4) x (# mismatches) +

    + (-5) x (# gap openings) + (-2) x (total length of all gaps)

Scoring rules matrices l.jpg
Scoring Rules/Matrices

  • Why are they important?

    • The choice of a scoring rule can strongly influence the outcome of sequence analysis

  • What do they mean?

    • Scoring matrices implicitly represent a particular theory of evolution

    • Elements of the matrices specify the similarity of one residue to another

The s ij in a scoring matrix as log likelihood ratio l.jpg
The Sij in a Scoring Matrix (as log likelihood ratio)

Slide7 l.jpg

Likelihood ratio for aligning a single pair of residues l.jpg
Likelihood Ratio for Aligning a Single Pair of Residues

  • Above: the probability that two residues are aligned by evolutionary descent

  • Below: the probability that they are aligned by chance

  • Pi, Pj are frequencies of residue i and j in all sequences (abundance)

Slide10 l.jpg

Two classes of widely used protein scoring matrices

PAM = % Accepted Mutations:1500 changes in 71 groups w/ > 85% similarityBLOSUM = Blocks Substitution Matrix:2000 “blocks” from 500 families

Slide11 l.jpg

Constructing blosum matrices l.jpg

Constructing BLOSUM Matrices

Blocks Substitution Matrices

Blosum matrices of specific similarities l.jpg
BLOSUM Matrices of Specific Similarities

  • Sequences with above a threshold similarity are clustered.

  • If clustering threshold is 62%, final matrix is BLOSUM62

Constructing a blosum matr 1 counting mutations l.jpg
Constructing a BLOSUM matr. training sequences1. Counting mutations

3 matrix of mutation probs l.jpg
3. Matrix of mutation probs. training sequences

5 obtaining a blosum matrix l.jpg
5. Obtaining a BLOSUM matrix training sequences

1 2 3 mutation frequency table l.jpg
1.2.3.Mutation Frequency Table training sequences

5 obtaining blosum62 matrix l.jpg
5. Obtaining BLOSUM62 Matrix training sequences

Blosum matrices reference l.jpg
BLOSUM matrices reference training sequences

  • S. Henikoff and J. Henikoff (1992). “Amino acid substitution matrices from protein blocks”. PNAS 89: 10915-10919

  • Training Data: ~2000 conserved blocks from BLOCKS database. Ungapped, aligned protein segments. Each block represents a conserved region of a protein family

Break l.jpg
Break training sequences

  • Homework

Pam matrices point accepted mutations l.jpg

PAM Matrices training sequences(Point Accepted Mutations)

Mutations accepted by natural selection

Pam phylogenetic tree l.jpg
PAM: Phylogenetic Tree training sequences

Pam accepted point mutation l.jpg
PAM: Accepted Point Mutation training sequences

Mutability of residue j l.jpg
Mutability of Residue training sequencesj

Slide32 l.jpg

Total Mutation Rate training sequences

is the total mutation rate of all amino acids

Normalize total mutation rate to 1 l.jpg
Normalize Total Mutation Rate to training sequences1%

This defines an evolutionary period: the period during which the 1% of all sequences are mutated (accepted of course)

Slide34 l.jpg

Mutation Probability Matrix Normalized training sequences

Such that the

Total Mutation Rate is 1%

Slide36 l.jpg

-- PAM1 mutation prob. matr. -- PAM2 Mutation Probability Matrix?

-- Mutations that happen in twice the evolution period of that for a PAM1

Pam matrix assumptions l.jpg
PAM Matrix: Assumptions Probability Matrix?

In two pam1 periods l.jpg
In two PAM1 periods: Probability Matrix?

  • {AR} = {AA and AR} or

    {AN and NR} or

    {AD and DR} or

    … or

    {AV and VR}

Pam k mutation prob matrix l.jpg
PAM-k Mutation Prob. Matrix Probability Matrix?

Slide41 l.jpg

PAM-k log-likelihood matrix Probability Matrix?

Slide42 l.jpg

PAM-250 Probability Matrix?

Slide43 l.jpg

  • PAM60—60%, PAM80—50%, Probability Matrix?

  • PAM120—40%

  • PAM-250 matrix provides a better scoring alignment than lower-numbered PAM matrices for proteins of 14-27% similarity

Pam matrices reference l.jpg
PAM Matrices: Reference Probability Matrix?

  • Atlas of Protein Sequence and Structure,

    Suppl 3, 1978, M.O. Dayhoff.

    ed. National Biomedical Research Foundation, 1

Slide45 l.jpg

Choice of Scoring Matrix Probability Matrix?

Comparing scoring matrix l.jpg

PAM Probability Matrix?

Based on extrapolation of a small evol. Period

Track evolutionary origins

Homologous seq.s during evolution


Based on a range of evol. Periods

Conserved blocks

Find conserved domains

Comparing Scoring Matrix

Sources of error in pam l.jpg
Sources of Error in PAM Probability Matrix?