1 / 41

Towards a model for -1 frameshift sites

Towards a model for -1 frameshift sites. Alain Denise 1,2 , Michaël Bekaert 1 , Laure Bidou 1 , Guillemette Duchateau-Nguyen 1 , Jean-Paul Forest 2 , Christine Froidevaux 2 , Isabelle Hatin 1 , Jean-Pierre Rousset 1 , Michel Termier 1 1 IGM (Institut de Génétique et Microbiologie)

tamber
Download Presentation

Towards a model for -1 frameshift sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards a model for -1 frameshift sites Alain Denise1,2, Michaël Bekaert1, Laure Bidou1, Guillemette Duchateau-Nguyen1, Jean-Paul Forest2, Christine Froidevaux2, Isabelle Hatin1, Jean-Pierre Rousset1, Michel Termier1 1 IGM (Institut de Génétique et Microbiologie) 2 LRI (Laboratoire de Recherche en Informatique) Université Paris-Sud, Orsay

  2. Translation mRNA CAUAUGGAUUAC AUG GUCUAAGAU 5’ 3’

  3. Translation ribosome CAUAUG GAUUAC AUG GUCUAAGAU 5’ 3’ The ribosome reads bases by triplets (or codons)from aSTART codon

  4. Translation CAUAUGGAU UAC AUG GUCUAAGAU 5’ 3’ The ribosome synthetizes one amino-acid per codon

  5. Translation CAUAUGGAU UAC AUG GUCUAAGAU 5’ 3’

  6. Translation CAUAUGGAU UAC AUG GUCUAAGAU 5’ 3’

  7. Translation CAUAUGGAU UAC AUG GUCUAAGAU 5’ 3’

  8. Translation CAUAUGGAU UAC AUG GUCUAAGAU 5’ 3’

  9. Translation CAUAUGGAU UAC AUG GUCUAAGAU 5’ 3’ The synthesis goes on until a STOPcodon is read 1 mRNA gives 1 protein

  10. Experimental fact • Some mRNAs encode two distinct proteins with same 5’ end

  11. STOP-1 START0 STOP0 0 phase ORF1a -1 phase ORF1b usual translation -1frameshift Programmed -1 frameshifting Non-deterministic event 1 mRNA gives 2 distinct proteinswith accurate ratio

  12. Typical -1 frameshift site [Brierley, 1989] S2 3’ L1 L’1 S1 L2 5’ AUG NNXXXY YYZ P SP Secondary structure Slippery sequence

  13. IBV frameshift site S2 U C C G A G C GAAA 3’ A G G C U C G G UGACGAUGGGG GCUG AUACCCC S1 5’ AUG UAU UUA AAC GGGUAC UUGC Pseudoknot Slippery sequence

  14. Translation with frameshift U C C G A G C GAAA 3’ A G G C U C G G UGACGAUGGGG GCUG AUACCCC UUGC 5’ AUG UAUUUA AACGGG UAC

  15. Translation with frameshift U C C G A G C GAAA 3’ A G G C U C G G UGACGAUGGGG GCUG AUACCCC UUGC 5’ UAU UUA AAC GGG UAC

  16. -1 shift Translation with frameshift U C C G A G C GAAA 3’ A G G C U C G G UGACGAUGGGG GCUG AUACCCC UUGC 5’ UAU UUA AAC GGG UAC

  17. Translation with frameshift 3’ 5’ UA UUU AAA CGG GUA CGG GGU AGC AGU

  18. Translation with frameshift 3’ 5’ UA UUU AAA CGG GUA CGG GGU AGC AGU

  19. Translation with frameshift 3’ 5’ UA UUU AAA CGG GUA CGG GGU AGC AGU

  20. Translation with frameshift 3’ 5’ UA UUU AAA CGG GUA CGG GGU AGC AGU

  21. Goals • To improve the known model for viral frameshift sites • To identify new frameshift sites in viral and non viral genomes

  22. Our approach Biologicalsequences In silico andin vivovalidation representexplain predict Formalmodels Predictiontools Applications to other genomes

  23. IBV frameshift site: spacer 3’ 5’ GGGUAC

  24. HAST-1 UAC AAA BEV UGU UG EAV UGA GAG HCV GAG UC IBV GGG UAC MHV GGG UU TGEV GAG RCNMV UAG GC BWYV GGA GUG PLRV GGG CAA BLV UAA UAG A FIV UGG AAG GC HIV-1 GGG AAG AU HTLV-2 UCC UUA A JSR UGG GUG A MMTV gag-proUUG UAA A MMTV pro-polUGA U RSV UAG GGA SRV-1 GGA CUG A Consensus UGG UAG A GAA GUA Spacer consensus

  25. Lab experiments Test construct -1 phase pSV40 lacZ luc FS signal FS reporter Expression reporter pSV40 lacZ luc FS signal N Control construct 0 phase

  26. Spacer: lab experiments Spacer relative FS ratewild-type IBV GGGUA 100U mutant UGGUA 100 A mutant AGGUA 55C mutant CGGUA32CC mutant CCGUA70CCU mutant CCUUA49

  27. Refining the model: Machine learning • To identify relevant properties that characterize FS sites • Disjunctive learning: all sequences do not frameshift for the same reasons [Giedroc et al., 2000]

  28. Annotating data: spacer 3’ 5’ GGGUAC

  29. Example of data: SP • SP = GGGUAC • number of A = 1; C = 1; G = 3; U = 1; • % of A = 33; C = 33; G = 50; U = 33; • first = G; • last = C;

  30. Annotating data: stem 1 3’ UGACGAUGGGG GCUG AUACCCC 5’

  31. Example of data: stem 1 • S1 = • 5' side :GGGGUAGCAGU • 3' side : CCCCAUAGUCG • stability : -20,7 kcal/mol

  32. Annotating data: full sequence U C C G A G C GAAA 3’ A G G C U C G G UGACGAUGGGG GCUG AUACCCC 5’ U UUA AAC GGGUAC UUGC

  33. Example of data : FS rate FS rate = 22 %

  34. GloBo • Disjunctive learning algorithm • Suited to small amount of data • Won the PTE challenge on analogous data

  35. Example of rules If SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4and %T in S2.5’  30 and %G in S2.5’  70 thenFS rate  5% If %G in S1.5' bottom half  80 and %C in L1  45 thenFS rate  5% If SP length  5 and S1.3' length  6 and %C in S1.3'  45 thenFS rate  5% ...

  36. Covering and prediction If SP length  5 and number of G in S1.5’ bottom half  3 and number of G in S1.5’  4and %T in S2.5’  30 and %G in S2.5’  70 thenFS rate  5% Covering of examples : 70 % Examples predicted in test set : 80 %

  37. Is R1relevant for frameshift ? Stem 1 5’-side relative FS R1 rate wild-type IBV GGGGU AUCAGU 100 yesmutant 1 GGUCG AUCAGU 41 yesmutant 2 GGGGUUCUACA 55 yes mutant 3 GCUCG AUCAGU 36 nomutant 4 GCCCUAUCAGU 73 no

  38. Covering and prediction If SP length  5 and S1.3' length  6 and %C in S1.3'  45 thenFS rate  5% Covering of examples : 45 % Examples predicted in test set : 40 %

  39. Conclusion • Spacer: • correlation between primary sequence and FS rate has been established • systematic experimentation going on

  40. Conclusion Biologicalsequences In silico andin vivovalidation Formalmodels Predictiontools Applications to other genomes

  41. Spacer

More Related