Loading in 5 sec....

Evolution and Scoring RulesPowerPoint Presentation

Evolution and Scoring Rules

- 69 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Evolution and Scoring Rules' - chinue

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### PAM Matrices training sequences(Point Accepted Mutations)

### Global Alignment with Probability Matrix? Affine Gaps

Evolution and Scoring Rules

- Example Score =
5 x (# matches) + (-4) x (# mismatches) +

+ (-7) x (total length of all gaps)

- Example Score =
5 x (# matches) + (-4) x (# mismatches) +

+ (-5) x (# gap openings) + (-2) x (total length of all gaps)

Scoring Rules vs. Scoring Matrices

- Nucleotide vs. Amino Acid Sequence
- The choice of a scoring rule can strongly influence the outcome of sequence analysis
- Scoring matrices implicitly represent a particular theory of evolution
- Elements of the matrices specify the similarity of one residue to another

Translation - Protein Synthesis:Every 3 nucleotides (codon) are translated into one amino acid

DNA: A T G C

1:1

RNA: A U G C

3:1

Protein: 20 amino acids

Replication

Transcription

Translation

Log Likelihoods used as Scoring Matrices:PAM - % Accepted Mutations:1500 changes in 71 groups w/ > 85% similarityBLOSUM – Blocks Substitution Matrix:2000 “blocks” from 500 families

Likelihood Ratio for Aligning a Single Pair of Residues

- Above: the probability that two residues are aligned by evolutionary descent
- Below: the probability that they are aligned by chance
- Pi, Pj are frequencies of residue i and j in all protein sequences (abundance)

Likelihood Ratio of Aligning Two Sequences

- The alignment score of aligning two sequences is the log likelihood ratio of the alignment under two models
- Common ancestry
- By chance

- PAM and BLOSUM matrices are all log likelihood matrices
- More specificly:
- An alignment that scores 6 means that the alignment by common ancestry is 2^(6/2)=8 times as likely as expected by chance.

BLOSUM matrices for Protein

- S. Henikoff and J. Henikoff (1992). “Amino acid substitution matrices from protein blocks”. PNAS 89: 10915-10919
- Training Data: ~2000 conserved blocks from BLOCKS database. Ungapped, aligned protein segments. Each block represents a conserved region of a protein family

Constructing BLOSUM Matrices of Specific Similarities

- Sets of sequences have widely varying similarity. Sequences with above a threshold similarity are clustered.
- If clustering threshold is 62%, final matrix is BLOSUM62

A toy example of constructing a BLOSUM matrix from 4 training sequences

Constructing a BLOSUM matr. training sequences1. Counting mutations

Constructing a BLOSUM matr. training sequences2. Tallying mutation frequencies

Constructing a BLOSUM matr. training sequences3. Matrix of mutation probs.

4. Calculate abundance of each residue (Marginal prob) training sequences

5. Obtaining a BLOSUM matrix training sequences

Constructing the real BLOSUM62 Matrix training sequences

1.2.3.Mutation Frequency Table training sequences

4. Calculate Amino Acid Abundance training sequences

5. Obtaining BLOSUM62 Matrix training sequences

Mutations accepted by natural selection

PAM Matrices training sequences

- Accepted Point Mutation
- Atlas of Protein Sequence and Structure,
Suppl 3, 1978, M.O. Dayhoff.

ed. National Biomedical Research Foundation, 1

- Based on evolutionary principles

Constructing PAM Matrix: Training Data training sequences

PAM: Phylogenetic Tree training sequences

PAM: Accepted Point Mutation training sequences

Mutability training sequences

Total Mutation Rate training sequences

is the total mutation rate of all amino acids

Normalize Total Mutation Rate training sequences

Mutation Probability Matrix (transposed) M*10000 training sequences

-- PAM1 mutation prob. matr. --PAM2 Mutation Probability Matrix?

-- Mutations that happen in twice the evolution period of that for a PAM1

PAM Matrix: Assumptions Probability Matrix?

In two PAM1 periods: Probability Matrix?

- {AR} = {AA and AR} or
{AN and NR} or

{AD and DR} or

… or

{AV and VR}

Entries in a PAM-2 Mut. Prob. Matr. Probability Matrix?

PAM-k Mutation Prob. Matrix Probability Matrix?

PAM-1 log likelihood matrix Probability Matrix?

PAM-k log likelihood matrix Probability Matrix?

PAM-250 Probability Matrix?

- PAM60—60%, PAM80—50%, Probability Matrix?
- PAM120—40%
- PAM-250 matrix provides a better scoring alignment than lower-numbered PAM matrices for proteins of 14-27% similarity

Sources of Error in PAM Probability Matrix?

PAM Probability Matrix?

Based on extrapolation of a small evol. Period

Track evolutionary origins

Homologous seq.s during evolution

BLOSUM

Based on a range of evol. Periods

Conserved blocks

Find conserved domains

Comparing Scoring MatrixChoice of Scoring Matrix Probability Matrix?

Complex Dynamic Programming

Problem w/ Independent Gap Penalties Probability Matrix?

- The occurrence of x consecutive deletions/insertions is more likely than the occurrence of x isolated mutations
- We should penalize x long gap less than x
times of the penalty for one gap

Affine Gap Penalty Probability Matrix?

- w2 is the penalty for each gap
- w1 is the _extra_ penalty for the 1st gap

Scoring Rule not Additive! Probability Matrix?

- We need to know if the current gap is a new gap or the continuation of an existing gap
- Use three Dynamic Programming matrices to keep track of the previous step

S1 is the vertical sequence Probability Matrix?

S2 is the horizontal sequence

- (From Diagonal) a(i,j): current position is a match
- (From Left) b(i,j): current position is a gap in S1
- (From Above) c(i,j): current position is a gap in S2
Filling the next element in each matrix depends on the previous step, which is stored in the three matrices.

Last step a match Probability Matrix?

a gap in S2

a gap in S1

new gap in S2

a continued gap in S2

a gap in S2 following

a gap in S1

Decisions in Seq. Alignment Probability Matrix?

- Local or global alignment?
- Which program to use
- Type of scoring matrix
- Value of gap penalty

A Probability Matrix?ij*10

PAM-k log-likelihood matrix Probability Matrix?

Download Presentation

Connecting to Server..