1 / 12

UMR 1095 - ASP

UMR 1095 - ASP. Structural & Comparative Genomics in Bread Wheat TriAnnotPipeline A LifeGrid Project based on AUVERGRID. 3rd EGEE User Forum February 12th, 2008. F. Giacomoni , M. Reichstadt, P. Leroy Génétique, Diversité & Ecophysiologie des Céréales - Clermont-Ferrand, France.

jeslyn
Download Presentation

UMR 1095 - ASP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UMR 1095 - ASP Structural & Comparative Genomics in Bread Wheat TriAnnotPipelineA LifeGrid Project based on AUVERGRID 3rd EGEE User Forum February 12th, 2008 F. Giacomoni, M. Reichstadt, P. Leroy Génétique, Diversité & Ecophysiologie des Céréales - Clermont-Ferrand, France

  2.  17.000 Mb Human ~ 3.000 Mb  4.800 Mb • 85% Repeat sequences  2.800 Mb  380 Mb 70-80%  140 Mb 50% 50-80% 10% Maize Barley Bread wheat Rice A. thaliana Wheat as a challenge for Genomics • Important Economic Crop • Large Genome size

  3. I.N.R.A. Work on the Wheat Genome • Sequencing • Annotating • Discover Genes • Find Transposable Elements • Study other biological components AAAATCGATATAGAGTATGTAGACAAATTTTAAACCCGGGGGAGAGAGAGA DNA sequence Results after Annotation of the DNA Sequence

  4. TriAnnot PipelineGRID DNAsequences DataBase (chado) & Viewers (GBrowse) http://urgi.versailles.inra.fr/projects/TriAnnot/ TEs Genes Manual curation Manual curation TREPcons REPET TriSet GeneFarm training data set General Pipeline Structure of TriAnnot TEs Manual curation EugeneGenemarkHMMGeneID

  5. TriAnnotPipelineGRID Architecture WEB / PipelineDevelopment RepeatMasker, est2genome, Gmap, BLAST, HMMPfam DataBanks GRID & Cluster WEB / PipelineProduction Users GFF GBrowse Login/password GnpGenome On Line APOLLO Login/password Manual Curation DownLoadgff/ARTEMIS gameXml/APOLLO Login/password GnpDB Local UpLoad Login/password gff

  6. BAC sequenceFASTA format Panel 2 Gene annotation Panel 1 Gene Structure ab initio PredictionGeneMarkHMM, GeneID, EuGene, GENSCAN, GeneZilla Transposable Element & repeats Block2 BLAST/Gmap with transcriptsFL-cDNA, EST, mRNA Block3a Block1a Block1b RepeatMasker BLASTx SwissProt / TrEMBL Block3b TREPnr, TREPtotalRepBase, Gene Model Panel 3 RAP-like (Japan) Other biological target searches BLASTx / TREPprot Block3c EVM + PASA (US) EUGENE (France) BLASTnUGset / IRGSP/ TIGR pseudo Block5a TRF  SSR Gene Function Block4 Masking Annotation Best Hit IWGSC annotation guide line nt, sts, htgs, gss Known Protein Best Hit proteins - At - Os Putative Protein tRNA Block5b BAC with masked TE Domain Containing Protein miRNA Block5c Expressed Gene Conserved Hypothetical Gene mtDNA Block5d Hypothetical Gene cpDNA … BAC with masked TEs & Genes TriAnnotPipelineGRID Detailed Architecture

  7. WEB INTERFACE PART with: Upload of BAC FASTA format sequence Programming parameters of the Annotation with 5 blocks Production of a step.xml Wheat Seq PIPELINE PART : STEP_0: * 3 RepeatMasker vs 3 DataBanks STEP_1: * 8 BLASTn vs 8 DataBanks * 1 BLASTx vs 1 DataBank * 1 Tandem Repeat Finder STEP_2: * 1 EugeneIMM Rice * 1 GeneId * 4 GeneMarkHMM with 4 matrix STEP_3: * 1 tBLASTx vs 1 DataBank * 1 BLASTn vs 1 DataBank * 1 BLASTx vs 1 DataBank STEP_4: * 2 tBLASTn vs 2 DataBank RESULTS FILES (GFF Format)

  8. PIPELINE LOCAL PART: STEP_1B: * 1 TRF STEP_2: * 1 EugeneIMM Rice * 1 GeneId * 4 GeneMarkHMM STEP_3C: * 3 Gene Modelling TriAnnotPipelineGRID Architecture WEB INTERFACE PART with: Upload of BAC FASTA format sequence Programming parameters of the Annotation with 5 blocks Production of a step.xml Wheat Seq PIPELINE PART: PIPELINE_GRID PART I (STEP_1A) 5 RepeatMasker (RM) PIPELINE_GRID PART II (STEP_1B, 3A, 3B, 4A, 4B, 5A et 5D) 14 BLASTn 8 GMap 5 RM 3 BLASTx 6 BLASTp 1 tBLASTn 1 PFAM RESULTS FILES (GFF Format)

  9. Bioinformatic algorithms UI JDL Bioinformatic algorithms Bioinformatic databases SE DB update service Computing Element (CE) User Interface Server Grid part Server part Bioinformatic package

  10. Bioinformatic algorithms UI JDL Server UI CE Computing Element (CE) Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database Get the parameter Create the XML step file Get the input (sequence) file Create the grid environment (JDL, shellscripts) Mask the repeated sequences RepeatMasker/Blast/ GMap/HMMer Retrieve the output Fill the database

  11. Bioinformatic algorithms UI JDL 4-Creation environment 3-copy input files 1-Parameters + input file 7- job output 5-job submission 8-output transfer CE 6-job running (BLAST/ HMMer/RepeatMasker/GMap) 2-Creation XML file 9-DB filling

  12. F. Giacomoni C. Charpentier N. Guilhot F. Choulet P. Leroy C. Feuillet M. ReichstadtA. ClaudeM. Liauzu A. Mahul TriAnnotPipelineGRID Partners 2007-2008 M. Alaux T. Flutre I. Blanc-Lenfle S. Reboux H. Quesneville B. Haas F. Legeai T. Tanaka H. Ikawa H. Numa T. Itoh B. Kronmiller

More Related