Evolution and Scoring Rules

Evolution and Scoring Rules • Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) • Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-5) x (# gap openings) + (-2) x (total length of all gaps)

Scoring Matrices

Scoring Rules vs. Scoring Matrices • Nucleotide vs. Amino Acid Sequence • The choice of a scoring rule can strongly influence the outcome of sequence analysis • Scoring matrices implicitly represent a particular theory of evolution • Elements of the matrices specify the similarity of one residue to another

Translation - Protein Synthesis:Every 3 nucleotides (codon) are translated into one amino acid DNA: A T G C 1:1 RNA: A U G C 3:1 Protein: 20 amino acids Replication Transcription Translation

Nucleotide sequence determines the amino acid sequence

Translation - Protein Synthesis RNA Protein 5’ -> 3’ : N-term -> C-term

Log Likelihoods used as Scoring Matrices:PAM - % Accepted Mutations:1500 changes in 71 groups w/ > 85% similarityBLOSUM – Blocks Substitution Matrix:2000 “blocks” from 500 families

Log Likelihoods used as Scoring Matrices:BLOSUM

Likelihood Ratio for Aligning a Single Pair of Residues • Above: the probability that two residues are aligned by evolutionary descent • Below: the probability that they are aligned by chance • Pi, Pj are frequencies of residue i and j in all protein sequences (abundance)

Likelihood Ratio of Aligning Two Sequences

The alignment score of aligning two sequences is the log likelihood ratio of the alignment under two models • Common ancestry • By chance

PAM and BLOSUM matrices are all log likelihood matrices • More specificly: • An alignment that scores 6 means that the alignment by common ancestry is 2^(6/2)=8 times as likely as expected by chance.

BLOSUM matrices for Protein • S. Henikoff and J. Henikoff (1992). “Amino acid substitution matrices from protein blocks”. PNAS 89: 10915-10919 • Training Data: ~2000 conserved blocks from BLOCKS database. Ungapped, aligned protein segments. Each block represents a conserved region of a protein family

Constructing BLOSUM Matrices of Specific Similarities • Sets of sequences have widely varying similarity. Sequences with above a threshold similarity are clustered. • If clustering threshold is 62%, final matrix is BLOSUM62

A toy example of constructing a BLOSUM matrix from 4 training sequences

Constructing a BLOSUM matr.1. Counting mutations

Constructing a BLOSUM matr.2. Tallying mutation frequencies

Constructing a BLOSUM matr.3. Matrix of mutation probs.

4. Calculate abundance of each residue (Marginal prob)

5. Obtaining a BLOSUM matrix

Constructing the real BLOSUM62 Matrix

1.2.3.Mutation Frequency Table

4. Calculate Amino Acid Abundance

5. Obtaining BLOSUM62 Matrix

PAM Matrices (Point Accepted Mutations) Mutations accepted by natural selection

PAM Matrices • Accepted Point Mutation • Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff. ed. National Biomedical Research Foundation, 1 • Based on evolutionary principles

Constructing PAM Matrix: Training Data

PAM: Phylogenetic Tree

PAM: Accepted Point Mutation

Mutability

Total Mutation Rate is the total mutation rate of all amino acids

Normalize Total Mutation Rate

Mutation Probability Matrix Normalized Such that the Total Mutation Rate is 1%

Mutation Probability Matrix (transposed) M*10000

-- PAM1 mutation prob. matr. --PAM2 Mutation Probability Matrix? -- Mutations that happen in twice the evolution period of that for a PAM1

PAM Matrix: Assumptions

In two PAM1 periods: • {AR} = {AA and AR} or {AN and NR} or {AD and DR} or … or {AV and VR}

Entries in a PAM-2 Mut. Prob. Matr.

PAM-k Mutation Prob. Matrix

PAM-1 log likelihood matrix

PAM-k log likelihood matrix

PAM-250

PAM60—60%, PAM80—50%, • PAM120—40% • PAM-250 matrix provides a better scoring alignment than lower-numbered PAM matrices for proteins of 14-27% similarity

Sources of Error in PAM

PAM Based on extrapolation of a small evol. Period Track evolutionary origins Homologous seq.s during evolution BLOSUM Based on a range of evol. Periods Conserved blocks Find conserved domains Comparing Scoring Matrix

Evolution and Scoring Rules

Evolution and Scoring Rules

Presentation Transcript

Market Scoring Rules

CPS 196.2 Proper scoring rules

Scoring:

Scoring

Generalized Scoring Rules

Overview and Scoring

Scoring and Reporting

SCORING

Hanson’s Market Scoring Rules

Adult Sleep Stage Scoring Rules

Scoring

Scoring and Ranking

Class 3: Estimating Scoring Rules for Sequence Alignment

FOOTBALL RULES SUMMARY RULE 8 SCORING PLAYS AND TOUCHBACK

Market Scoring Rules As Combinatorial Information Market Makers

Alignment scoring matrices and evolution

Scoring Rules for Competitive Tendering

A dapting Proper Scoring Rules for Measuring Subjective Beliefs

Ethical Rules, Games, and Evolution

Scoring Rules, Generalized Entropy, and Utility Maximization

Scoring Rules, Generalized Entropy, and Utility Maximization

Proper Scoring Rules and Prospect Theory