- By
**Jimmy** - Follow User

- 255 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'scoring matrices' - Jimmy

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Constructing BLOSUM Matrices

### PAM Matrices training sequences(Point Accepted Mutations)

Diff. Scoring Rules Lead to Diff. Alignments

- Example Score =
5 x (# matches) + (-4) x (# mismatches) +

+ (-7) x (total length of all gaps)

- Example Score =
5 x (# matches) + (-4) x (# mismatches) +

+ (-5) x (# gap openings) + (-2) x (total length of all gaps)

Scoring Rules/Matrices

- Why are they important?
- The choice of a scoring rule can strongly influence the outcome of sequence analysis

- What do they mean?
- Scoring matrices implicitly represent a particular theory of evolution
- Elements of the matrices specify the similarity of one residue to another

The Sij in a Scoring Matrix (as log likelihood ratio)

- The alignment score of aligning two sequences is the log likelihood ratio of the alignment under two models
- Common ancestry
- By chance

Likelihood Ratio for Aligning a Single Pair of Residues

- Above: the probability that two residues are aligned by evolutionary descent
- Below: the probability that they are aligned by chance
- Pi, Pj are frequencies of residue i and j in all sequences (abundance)

Likelihood Ratio of Aligning Two Sequences

Two classes of widely used protein scoring matrices

PAM = % Accepted Mutations:1500 changes in 71 groups w/ > 85% similarityBLOSUM = Blocks Substitution Matrix:2000 “blocks” from 500 families

- PAM and BLOSUM matrices are all log likelihood matrices
- More specifically:
- An alignment that scores 6 means that the alignment by common ancestry is 2^(6/2)=8 times as likely as expected by chance.

Blocks Substitution Matrices

BLOSUM Matrices of Specific Similarities

- Sequences with above a threshold similarity are clustered.
- If clustering threshold is 62%, final matrix is BLOSUM62

A toy example of constructing a BLOSUM matrix from 4 training sequences

Constructing a BLOSUM matr. training sequences1. Counting mutations

2. Tallying mutation frequencies training sequences

3. Matrix of mutation probs. training sequences

4. Calculate abundance of each residue (Marginal prob) training sequences

5. Obtaining a BLOSUM matrix training sequences

Constructing the real BLOSUM62 Matrix training sequences

1.2.3.Mutation Frequency Table training sequences

4. Calculate Amino Acid Abundance training sequences

5. Obtaining BLOSUM62 Matrix training sequences

BLOSUM matrices reference training sequences

- S. Henikoff and J. Henikoff (1992). “Amino acid substitution matrices from protein blocks”. PNAS 89: 10915-10919
- Training Data: ~2000 conserved blocks from BLOCKS database. Ungapped, aligned protein segments. Each block represents a conserved region of a protein family

Break training sequences

- Homework

Mutations accepted by natural selection

Constructing PAM Matrix: Training Data training sequences

PAM: Phylogenetic Tree training sequences

PAM: Accepted Point Mutation training sequences

Mutability of Residue training sequencesj

Total Mutation Rate training sequences

is the total mutation rate of all amino acids

Normalize Total Mutation Rate to training sequences1%

This defines an evolutionary period: the period during which the 1% of all sequences are mutated (accepted of course)

Mutation Probability Matrix (transposed) M*10000 training sequences

-- PAM1 mutation prob. matr. -- PAM2 Mutation Probability Matrix?

-- Mutations that happen in twice the evolution period of that for a PAM1

PAM Matrix: Assumptions Probability Matrix?

In two PAM1 periods: Probability Matrix?

- {AR} = {AA and AR} or
{AN and NR} or

{AD and DR} or

… or

{AV and VR}

Entries in a PAM-2 Mut. Prob. Matr. Probability Matrix?

PAM-k Mutation Prob. Matrix Probability Matrix?

PAM-k log-likelihood matrix Probability Matrix?

PAM-250 Probability Matrix?

- PAM60—60%, PAM80—50%, Probability Matrix?
- PAM120—40%
- PAM-250 matrix provides a better scoring alignment than lower-numbered PAM matrices for proteins of 14-27% similarity

PAM Matrices: Reference Probability Matrix?

- Atlas of Protein Sequence and Structure,
Suppl 3, 1978, M.O. Dayhoff.

ed. National Biomedical Research Foundation, 1

Choice of Scoring Matrix Probability Matrix?

PAM Probability Matrix?

Based on extrapolation of a small evol. Period

Track evolutionary origins

Homologous seq.s during evolution

BLOSUM

Based on a range of evol. Periods

Conserved blocks

Find conserved domains

Comparing Scoring MatrixSources of Error in PAM Probability Matrix?

Download Presentation

Connecting to Server..