1 / 11

Estimating Evolutionary Changes in Protein Coding Sequences

Learn about Nei & Gojobori's unweighted method for estimating evolutionary changes in protein coding sequences, including counting synonymous and nonsynonymous sites and nucleotide differences. Discover Jukes and Cantor's model for multiple nucleotide substitution correction and how to estimate divergence time.

josephsims
Download Presentation

Estimating Evolutionary Changes in Protein Coding Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICB 2007, Hong KongSupplementary formulasRates of Evolutionary Changes

  2. Outline • Nei & Gojobori’s unweighted method • Synonymous / Nonsysnonymous sites • Synonymous / Nonsynonymous nucleotide difference • Jukes and Cantor’s model for the multiple nucleotide substitution correction • Estimation of the divergence time

  3. Nei & Gojobori’s unweighted method • Nei & Gojobori’s unweighted method is a simple way of estimating number of nucleotide substitutions between two protein-coding sequences. • We’ll show this method in two steps • Step 1: Count synonymous / nonsysnonymous sites in one or more sequences. • Step 2: Count synonymous / nonsynonymous nucleotide difference between two sequences

  4. Synonymous / Nonsysnonymous sites • For example, codons UUA UUG CUU CUC CUA CUG represent the same amino acid • We first calculate the synonymous (the amino acid type is unchanged after nucleotide substitution) fraction of UUA • When we are comparing two sequences, calculate the S and N for both sequences for the length of the shorter sequence and average the values from the two sequences.

  5. Synonymous / Nonsynonymous nucleotide difference • The differences between two sequences are counted codon by codon. When comparing two codons, there are three types of possibilities: one, two and three nucleotide pairs are mismatched. Each case is taken care of separately. • Case 1: 1 mismatched nucleotide pair • If the two codons represent the same amino acid. Sd=Sd+1 • If the two codons represent different amino acid. Nd=Nd+1

  6. Synonymous / Nonsynonymous nucleotide difference • Case 2: 2 mismatched nucleotide pairs • Assume only one nucleotide substitution is allowed at once, there are two paths of the same probability for such substitution. For example: CCC →CAA • CCC → CCA → CAA • CCC → CAC → CAA • There are 4 substitutions in total, 1 is synonymous (CCC → CCA), 3 are nonsynonymous. Therefore Sd=Sd+1/4 and Nd=Nd+3/4. • There are always 2 paths for 2 mismatched pairs, but if any of the paths involves a stop codon, the path is ignored.

  7. Synonymous / Nonsynonymous nucleotide difference • Case 2: 2 mismatched nucleotide pairs – special case (stop codon in path) • One example of such special case is AAA → TAT • AAA → TAA → TAT • AAA → AAT → TAT • In path 1, TAA is a stop codon, therefore path 1 is ignored and we only use the two substitutions in path 2. Hence Sd=Sd + 0/2, Nd=Nd + 2/2.

  8. Synonymous / Nonsynonymous nucleotide difference • Case 3: 3 mismatched nucleotide pairs • There are six paths of the same probability for such substitution. The reason that there are six paths are the same as the previous case. Only we have three substitutions per path now and hence 3!=6 paths. For example CTT→ AGG. • CTT → ATT → AGT → AGG • CTT → ATT → ATG → AGG • CTT → CTG → AGT → AGG • CTT → CTG → ATG → AGG • CTT → CGT → CGG → AGG • CTT → CTG → CGG → AGG

  9. Synonymous / Nonsynonymous nucleotide difference • There are 18 substitutions in total, 6 are synonymous and 12 are nonsynonymous. Therefore Sd=Sd + 6/18 and Nd=Nd + 12/18. • Similarly, if there is a stop codon in the path, the path is ignored.

  10. Jukes and Cantor’s model for the multiple nucleotide substitution correction • For synonymous substitution • For nonsynonymous substitution • The dn/ds ratio has proved to be useful in accessing the protein coding part in the genomes. Generally in protein coding section, dn < ds, which indicates strong selection. Conversely, a high dn/ds ratio indicates weak selection.

  11. Estimation of the divergence time • We can estimate the divergence time of the two species using the previous result. The formula is:

More Related