A mathematical model of the genetic code structure and applications
Download
1 / 59

A mathematical model of the genetic code: structure and applications - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

A mathematical model of the genetic code: structure and applications. Antonino Sciarrino Università di Napoli “Federico II” INFN, Sezione di Napoli TAG 2006 Annecy-leVieux, 9 November 2006.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A mathematical model of the genetic code: structure and applications' - dung


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
A mathematical model of the genetic code structure and applications

A mathematical model of the genetic code: structure and applications

Antonino Sciarrino

Università di Napoli “Federico II” INFN, Sezione di Napoli

TAG 2006 Annecy-leVieux, 9 November 2006


Mathematical model of the genetic code
Mathematical Model of the Genetic Code applications

Work in collaboration with

Luc FRAPPAT

Paul SORBA

Diego COCURULLO


Summary
SUMMARY applications

  • Introduction

  • Description of the model

  • Applications : Codon usage frequencies

    DNA dimers free energy

  • Work in progress


It is amazing that the complex biochemical relations between DNA and proteins were very quickly reduced to a mathematical model. Just few months after the WATSON-CRICK discovery G. GAMOW proposed the “diamond code”


Gamow diamond code
Gamow “diamond code”

Gamow, Nature (1954)

Nucleotides are

denoted by number 1,2,3,4

Amino-acids FIT

the rhomb -shaped “holes”

formed by the 4 nucleotides

20 a.a. !


Since 1954 many mathematical modelisations of the genetic coded have been proposed (based on informatiom, thermodynamic, symmetry, topology… arguments) Weak point of the models: often poor explanatory and/or predictive power


The genetic code
The genetic code coded have been proposed (based on


Crystal basis model of the genetic code
Crystal basis model coded have been proposed (based on of the genetic code

L.Frappat, A. Sciarrino, P. Sorba: Phys.Lett. A (1998)

4 basisC, U/T (Pyrimidines) G, A (Purines)

are identified by a couple of “spin” labels

(+  1/2, -  -1/2)

Mathematically - C,U/T,G,A transform as the 4 basis vectors

of irrep. (1/2, 1/2) of U q  0 (sl(2)H sl(2)V)


Crystal basis model of the genetic code1
Crystal basis model coded have been proposed (based on of the genetic code

  • Dinucleotides are composite states

    ( 16 basis vectors of (1/2, 1/2)2 )

    belonging to “sets” identified by two integer numbers

    JH JV Ineach “set” the dinucleotide is

    identified by two labels

    - JH  JH,3  JH - JV  JV,3  JV

    Ex.

    CU = (+,+)  (+, -)

    ( JH = 1/2, JH,3 = 1/2; JV = 1/2, JV,3 = 1/2)

    Follows from property of U(q  0)(sl(2))



Crystal basis model of the genetic code2
Crystal basis model Content of the genetic code

  • Codons are composite states

    ( 64 basis vectors of (1/2, 1/2) )

    belonging to “sets” identified by half- integerJH JV

    (“set”  irreducible representation = irrep.)

    Ex.

    CUA = (+,+)  (-, +)  (-,-)

    ( JH = 1/2, JH,3 = 1/2; JV = 1/2, JV,3 = 1/2)

    Follows from property of U(q  0)(sl(2))


Codons in the Contentcrystal basis


Codon usage frequency
Codon usage frequency Content

  • Synonymous codons are not used uniformly (codon bias)

  • codon bias (not fully understood) ascribed to evolutive-selective effects

  • codon bias depends

     Biological species (b.sp.)

     Sequence analysed

     Amino acid (a.a.) encoded

     Structure of the considered multiplet

     Nature of codon XYZ

     …………………….



Our analysis deals with Contentglobal codon usage , i.e. computed

over all the coding sequences (exonic region) for the b.sp.

of the considered specimen

To put into evidence possible general features of the standard

eukaryotic genetic code ascribable to its organisation and its

evolution


Let us define the codon usage probability for the codon XZN (X,Z,N  {A,C,G,UT in DNA} )P(XZN) = limit n   n XZN / N totn XZNnumber of times codon XZN used in the processes N tot total number of codons in the same processes For fixed XZ Normalization ∑NP(XZN) = 1 Note - Sextets are considered quartets + doublets  8 quartets


Def correlation coefficient r xy for two variables x p x y p y
Def. - Correlation coefficient (X,Z,N rXY for two variables X P..XY P..Y


Specimen genbank release 149 0 09 2005 n codons 100 000
Specimen (GenBank Release 149.0 09/2005 - Ncodons > 100.000)

  • 26 VERTEBRATES

  • 28 INVERTEBRATES

  • 38 PLANTS

  • TOTAL - 92 Biological species


Correlation coefficient vertebrates
Correlation coefficient 09/2005 - NVERTEBRATES


Correlation coefficient plants
Correlation coefficient 09/2005 - NPLANTS


Correlation coefficient invertebrates
Correlation coefficient 09/2005 - NINVERTEBRATES





Ratios of 09/2005 - Nobs2(X+Y) and th2(X+Y) = obs2(X)+ obs2(Y) averaged over the 8 a.a. for the sum of two codon probabilities


09/2005 - N

Indication for correlation for codon usage probabilitiesP(A) and P(C)

(P(U) and P(G))

for quartets.


Correlation between codon probabilities for different a a
Correlation between codon probabilities for different a.a. 09/2005 - N

  • Correlation coefficients between the 28 couples P XZN-X’Z’N where XZ(X’Z’) specify 8 quartets. The following pattern comes out for the whole eucaryotes specimen (n = 92)


The set of 8 quartets splits into 3 subsets
The set of 8 quartets splits into 3 subsets 09/2005 - N

  • 4 a.a. with correlated codon usage (Ser, Pro, Arg, Thr)

  • 2 a.a. with correlated codon usage (Leu, Val)

  • 2 a.a. with generally uncorrelated codon usage (Arg, Gly)


Statistical analysis 09/2005 - N

 Correlation for P(XZA)-P(XZC),XZ quartets

 Correlation for P(N) between {Ser, Pro, Thr, Ala} and

{Leu, Val}

The observed correlations

well fit in the mathematical scheme of

the crystal basis model

of the genetic code


In the crystal basis model p xyz can be written as function of
In the 09/2005 - Ncrystal basis model P(XYZ) can be written as function of


ASSUMPTION 09/2005 - N


Sum rules
09/2005 - NSUM RULES

K INDEPENDENT OF THE b.s.

XZ  QUARTETS


SUM RULES 09/2005 - N “Theoretical” correlation matrixXZ = NC,CG,GG,CU,GU


Observed averaged value of the 09/2005 - N

correlation matrix , in red the

theoretical value


Shannon entropy
Shannon Entropy 09/2005 - N

Let us define the Shannon entropy for the amino-acid

specified by the first two nucleotide XZ (8 quartes)


Shannon entropy1
Shannon Entropy 09/2005 - N

Using the previous expression forP(XZN) we get

N  (XZN), HbsN Hbs(XZN),PN  P(XZN)

SXZlargely independent of the b.sp.


Shannon entropy2
Shannon Entropy 09/2005 - N


Dna dinucleotide free energy
DNA dinucleotide free energy 09/2005 - N

Free energy for a pair of nucleotides, ex. GC, lying on

one strand of DNA, coupled with complementary pair,

CG, on the other strand.

CG from 5’  3’ correlated with GC from 3’  5’




Comparison with exp data
Comparison with exp. data Content

G in Kcal/mol




Work in progress and future perspectives
Work in progress and future perspectives Content

Fron the correspondence

{C,U/T,G,A} I.R. (1/2,1/2) of U q  0 (sl(2)H sl(2)V)

Any ordered N nucleotides sequence 

Vector of I.R.  (1/2,1/2)Nof U q  0 (sl(2)H sl(2)V)

New pametrization of nucleotidees sequences


ContentSpin” parametrisation


Algorithm for the spin parametrisation of ordered n nucleotide sequence
Algorithm for the “ Contentspin” parametrisation of orderedn-nucleotide sequence


From this parametrisation
From this parametrisation: Content

  • Alternative construction of mutation model, where mutation intensitydoes not depend from the Hamming distance between the sequences, but from the change of “labels” of the “sets”. C. Minichini, A.S., Biosystems (2006)

  • Characterization of particular sequences (exons, introns, promoter, 5’ or 3’ UTR sequences,….)

    L. Frappat, P. Sorba, A.S., L. Vuillon, in progress


For each gene of homo sap total 28 000 genes
For each gene of ContentHomo Sap. (total ~28.000 genes)

  • Consider the N-nucleotide coding sequence (CDS)

  • Compute the “ labels” JH, J3H ; JV, J3V

    for any n-nucleotide subsequence (1 n  N)

     Plot “ labels” versus n


Red j h green j 3h blue j v black j 3v
Red J ContentH - Green J3HBlue JV - Black J3V


Red j h green j 3h blue j v black j 3v1
Red J ContentH - Green J3HBlue JV - Black J3V


Red j h green j 3h blue j v black j 3v2
Red J ContentH - Green J3HBlue JV - Black J3V


Red j h green j 3h blue j v black j 3v3
Red J ContentH - Green J3HBlue JV - Black J3V


Numerical estimator
Numerical estimator Content

Define for any sequence of length N

Plot number of CDS with the same value of Diff (Sum)

versus Diff (Sum)

Compute Diff (Sum) for 28.000 random sequences (300 < N < 4300)

with uniform probability for each nucleotide

Comparison number of CDS -random sequences


Conclusions
Conclusions Content

  • Correlations in codon usage frequencies computed over the whole exonic region fit well in the mathematical scheme of the crystal basis model of the genetic code Missing explanation for the correlations

  • Formalism of crystal basis model useful to parametrize free energy for DNA dimers

  • More generally, use of U q  0 (sl(2)H sl(2)V) mathematical structure may be useful to describe sequences of nucleotides .


ad