number of substitutions between two protein coding genes n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Number of substitutions between two protein-coding genes PowerPoint Presentation
Download Presentation
Number of substitutions between two protein-coding genes

Loading in 2 Seconds...

play fullscreen
1 / 32

Number of substitutions between two protein-coding genes - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Number of substitutions between two protein-coding genes. Dan Graur. Handout. Computing the number of substitutions between two protein-coding sequences is more complicated, because a distinction should be made between synonymous and nonsynonymous substitutions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Number of substitutions between two protein-coding genes' - trilby


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide3

Computing the number of substitutions between two protein-coding sequences is more complicated, because a distinction should be made between synonymous and nonsynonymous substitutions.

slide6

Aims:1. Compute two numerators: The numbers of synonymous and nonsynonymous substitutions.2. Computetwo denominators: The numbers of synonymous and nonsynonymous sites.

slide7

T

Trp

Nonsynonymous

Difficulties with denominator:

1. The classification of a site changes with time: For example, the third position of CGG (Arg) is synonymous. However, if the first position changes to T, then the third position of the resulting codon, TGG (Trp), becomes nonsynonymous.

slide8

Difficulties with denominator:

2. Many sites are neither completely synonymous nor completely nonsynonymous. For example, a transition in the third position of GAT (Asp) will be synonymous, while a transversion to GAG or GAA will alter the amino acid.

slide9
Difficulties with numerator:1. The classification of the change depends on the order in which the substitutions had occurred.
slide10

Difficulties with numerator:1.When two homologous codons differ from each other by two substitutions or more the order of the substitutions must be known in order to classify substitutions into synonymous and nonsynonymous.

Example: CCC in sequence 1 and CAA in sequence 2.

Pathway I:

CCC (Pro)  CCA (Pro)  CAA (Gln)

1 synonymous and 1 nonsynonymous

Pathway II:

CCC (Pro)  CAC (His)  CAA (Gln)

2 nonsynonymous

slide11

Difficulties with numerator:2. Transitions occur with different frequencies than transversions. 3. The type of substitution depends on the mutation. Transitions result more frequently in synonymous substitutions than transversions.

slide12

Methods: Miyata & Yasunaga (1980) and Nei & Gojobori(1986)

1. Classification of sites. Consider a particular position in a codon. Let i be the number of possible synonymous changes at this site. Then this site is counted as i/3 synonymous and (3 – i)/3 nonsynonymous.

slide13

In TTT (Phe), the first two positions are nonsynonymous, because no synonymous change can occur in them, and the third position is 1/3 synonymous and 2/3 nonsynonymous because one of the three possible changes is synonymous.

slide14

2. Count the number of synonymous and nonsynonymous sites in each sequence and compute the averages between the two sequences. The average number of synonymous sites is NS and that of nonsynonymous sites is NA.

slide16
For two codons that differ by only one nucleotide, the difference is easily inferred. For example, the difference between the two codons GTC (Val) and GTT (Val) is synonymous, while the difference between the two codons GTC (Val) and GCC (Ala) is nonsynonymous.
slide18
For two codons that differ by two or more nucleotides, the estimation problem is more complicated, because we need to determine the order in which the substitutions occurred.
slide19

Pathway (1) requires one synonymous and one nonsynonymous change, whereas pathway (2) requires two nonsynonymous changes.

slide21

The unweighted method:Average the numbers of the different types of substitutions for all the possible scenarios. For example, if we assume that the two pathways are equally likely, then the number of nonsynonymous differences is (1 + 2)/2 = 1.5, and the number of synonymous differences is (1 + 0)/2 = 0.5.

slide22

The weighted method. Employ a priori criteria to assign the probability of each pathway. For instance, if the weight of pathway 1 is 0.9, and the weight for pathway 2 is 0.1, then the number of nonsynonymous differences between the two codons is (0.9  1) + (0.1  2) = 1.1, and the number of synonymous differences is 0.9.

slide24
4. The numbers of synonymous and nonsynonymous differences between the two protein-coding sequences are MS and MA, respectively.
slide25

The number of synonymous differences per synonymous site is pS = MS/NS The number of nonsynonymous differences per nonsynonymous site is pA = MA/NA

slide26
If we take into account the effect of multiple hits at the same site, we can make corrections by using Jukes and Cantor's formula:
number of amino acid replacements between two proteins
Number of Amino-Acid Replacements between Two Proteins
  • The observed proportion of different amino acids between the two sequences (p) is

p = n /L

  • n = number of amino acid differences between the two sequences
  • L = length of the aligned sequences.
number of amino acid replacements between two proteins1
Number of Amino-Acid Replacements between Two Proteins

The Poisson model is used to convert p into the number of amino replacements between two sequences (d ):

d = – ln(1 – p)

The variance of d is estimated as

V(d) = p/L (1 – p)