Quick Lesson on dN/dS

Quick Lesson on dN/dS • Neutral Selection • Codon Degeneracy • Synonymous vs. Non-synonymous • dN/dS ratios • Why Selection? • The Problem

What does selection “look” like? When moving into new dim-light environments, vertebrate ancestors adjusted their dim-light vision by modifying their rhodopsins • Functional changes have occurred • Biologically significant shifts have occurred multiple times • How do we know whether these shifts are adaptive or random? Yokoyama S et al. PNAS 2008;105:13480-13485

Neutral Selection Mutations will occur evenly throughout the genome. Pseudogenes? Introns? Promoters? Coding Regions?

Codon Degeneracy

Codon Degeneracy 1st position = strongly conserved AA #2 Pos #2 2nd position = conserved AA #1 Pos #1 3rd position = “wobbly” Wobble effect – an AA coded for by more than one codon AA #3 Pos #3

Synonymous vs Non-synonymous Synonymous: no AA change Non-synonymous: AA change

Synonymous vs Non-synonymous

dN/dS ratios N = Non-synonymous change S = Synonymous change dN = rate of Non-synonymous changes dS = rate of Synonymous changes dN / dS = the rate of Non-synonymous changes over the rate of Synonymous changes

Selection and dN/dS dN / dS == 1 => neutral selection No selective pressure dN / dS <= 1 => negative selection Selective pressure to stay the same dN / dS >= 1 => positive selection Selective pressure to change

Why Selection? Identify important gene regions Find drug resistance Locate thrift genes or mutations

dN/dS Problem Analyzes whole gene or large segments But, selection occurs at amino acid level This method lacks statistical power Thus the purpose of this paper

SLACsingle likelihood ancestor counting • The basic idea:Count the number of synonymous and nonsynonymous changes at each codon over the evolutionary history of the sample NN [Ds | T, A] NS [Ds| T, A]

SLAC L10I E40K

SLAC Strengths: • Computationally inexpensive • More powerful than other counting methods in simulation studies Weaknesses: • We are assuming that the reconstructed states are correct • Adding the number of substitutions over all the branches may hide significant events • Simulation studies shows that SLAC underestimates substitution rate Runtime estimates • Less than a minute for 200-300 sequence datasets

FELfixed effects likelihood • The basic idea:Use the principles of maximum likelihood to estimate the ratio of nonsynonymous to synonymous rates at each site

FEL fixed Likelihood Ratio Test Ho: α = β Ha: α ≠ β

FEL Strengths: • In simulation studies, substitution rates estimated by FEL closely approximate the actual values • Models variation in both the synonymous and nonsynonymous substitution rates • Easily parallelized, computational cost grows linearly Weaknesses: • To avoid estimating too many parameters, we fix the tree topology, branch lengths and rate parameters Runtime Estimates: • A few hours on a small cluster for several hundred sequences

RELrandom effects likelihood • The basic idea:Estimate the full likelihood nucleotide substitution model and the synonymous and nonsynonymous rates simultaneously. • Compromise: Use discrete categories for the rate distributions

REL Posterior Probability Ratio of the posterior and prior odds having ω > 1

REL Strengths: • Estimates synonymous, nonsynonymous and nucleotide rates simultaneously • Most powerful of the three methods for large numbers sequences Weaknesses: • Performs poorly with small numbers of sequences • Computationally demanding Runtime Estimates: • Not mentioned

Simulation Performance 8 sequences 64 sequences

Selection and dN/dS dN / dS == 1 => neutral selection No selective pressure dN / dS <= 1 => negative selection Selective pressure to stay the same dN / dS >= 1 => positive selection Selective pressure to change

Quick Lesson on dN/dS