1 / 47

New data and tools at TAIR

New data and tools at TAIR. (The Arabidopsis Information Resource). Overview of TAIR. Direct submission. Published papers. Journal collaborations. RNA-seq. Proteomic. Corrections. Genome release. Gene function. Other data: Markers Ecotypes Gene symbols New genomes. New tools.

gberg
Download Presentation

New data and tools at TAIR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New data and tools at TAIR (The Arabidopsis Information Resource)

  2. Overview of TAIR Direct submission Published papers Journal collaborations RNA-seq Proteomic Corrections Genome release Gene function Other data: Markers Ecotypes Gene symbols New genomes New tools Directly (TAIR pages) AND via other databases Researchers

  3. TAIR10 Genome Release • No assembly updates • Will incorporate: • 200M Ecker and Mockler RNA-seq reads • Additional proteomics data • Individual gene structure corrections sent to us RNA-seq Proteomic Corrections Genome release

  4. Mapping and Assembly • Mapping • RNA-seq sequences (Tophat (C. Trapnell), Supersplat (T.C. Mockler)) • Peptides (6-frame translation, spliced exon graph) • Assembly approaches • Augustus (M. Stanke) • Uses spliced RNA seq reads, peptides • Aim: Identify additional splice-variants, update existing genes • TAU (T.C. Mockler) • Uses spliced RNA seq reads • Aim: Identify additional splice-variants • Cufflinks (C. Trapnell) • Uses spliced and unspliced RNA seq data • Aim: Identify novel genes

  5. Preliminary Results Augustus/TAU/Cufflinks predicted models are classified into categories: Novel genes 21 Updated genes 812 Splice-variants 2134 B-list 1586 Rejects 2318

  6. TAIR10 Genome Release • No assembly updates • Will incorporate: • 200M Ecker and Mockler RNA-seq reads • Additional proteomics data • Individual gene structure corrections sent to us • Release expected in August 2010 RNA-seq Proteomic Corrections Genome release

  7. Experimentally Verified Gene Function Where does it come from??? Direct submission Published papers Journal collaborations • From research articles read by TAIR curators • From TAIR’s collaboration with journals • From direct submissions by researchers to TAIR Gene function

  8. Literature Curation Published papers Gene function • How? • Papers are prioritized according to novelty of gene function results • Highest priority papers are read and gene function is extracted • Why? • A lot of high quality experimental gene function information is only available in the form of articles • How many? • About 1/3 of all new articles containing gene function data are curated at TAIR each year

  9. Journal Collaboration Journal collaborations Gene function • How? • Author instructions, Excel sheet or online form • Why? • To capture a larger fraction of gene function data • Because publication is the right time to get the data into TAIR • What journals?

  10. Journal Collaboration

  11. Journal Collaboration Journal collaborations Gene function • 2010: • Journal of Integrative Plant Biology • Journal of Experimental Botany • Plant Science • Environmental Botany • Plant Physiology and Biochemistry • Plant, Cell and Environment Plant Physiology (2008) The Plant Journal (2009) • How? • Author instructions, Excel sheet or online form • Why? • To capture a larger fraction of gene function data • Because publication is the right time to get the data into TAIR • What journals?

  12. Direct Submission of Gene Function Direct submission Gene function • How? • Excel sheet or online form • Why? • To capture more data with a small curation team • Because researchers are the experts on the genes they study

  13. New online submission form 17986450

  14. Why Gene Ontology? • Standardization allows comparison across experiments and species • Hierarchical structure allows high level categorization • Well structured ontology framework facilitates computational analysis • Attached to data source (peer reviewed published research) • Experimental evidence can be distinguished from predictions

  15. Example Gene Ontology annotations 3 GO flavors Biological process Cellular component Molecular function

  16. New online submission form Autocomplete (just start typing to get a list of matching terms)

  17. New online submission form

  18. New online submission form

  19. What is the result of TAIR’s effort to capture gene function? Direct submission Published papers Journal collaborations • How many genes have experimental gene function in TAIR? Gene function

  20. Genes in TAIR with experimental evidence for biological process, molecular function or cellular component 9342 genes (May 31 2010) Number of genes

  21. Arabidopsis Gene Function in TAIR Protein coding genes Predicted function Genes Experimental function Year

  22. Overview of TAIR Direct submission Published papers Journal collaborations RNA-seq Proteomic Corrections Genome release Gene function Other data: Markers Ecotypes Gene symbols New genomes New tools Directly (TAIR pages) AND via other databases Researchers

  23. GBrowse_syn Tool by Sheldon McKay, CSHL Alignment data from Pedro Pattyn, Van de Peer lab, U. of Ghent

  24. GBrowse_syn A. lyrata A. thaliana poplar

  25. NBrowse Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYU Interaction datasets curated by TAIR, BioGRID and IntAct

  26. NBrowse Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYU Interaction datasets curated by TAIR, BioGRID and IntAct

  27. NBrowse Tool by H.-L. Kao, F. Piano, M. Schuman, M. Gibson, Kris Gunsalus, NYU Interaction datasets curated by TAIR, BioGRID and IntAct

  28. Arabidopsis lyrata Genes have been loaded Working on adding some gene function information and improving searching

  29. Overview of TAIR Direct submission Published papers Journal collaborations RNA-seq Proteomic Corrections Genome release Gene function Other data: Markers Ecotypes Gene symbols New genomes New tools Directly (TAIR pages) AND via other databases Researchers

  30. Central registry for Gene Symbols

  31. Central registry for Gene Symbols

  32. Central registry for Gene Symbols

  33. Central registry for Gene Symbols

  34. Helpdesk

  35. Helpdesk

  36. Helpdesk

  37. RSS news feed

  38. RSS news feed

  39. TAIR Facebook Page

  40. TAIR Twitter Feed

  41. TAIR Staff Genome Annotation: Gene Function/GO: ? Tanya Berardini Donghui Li David Swarbreck Philippe Lamesch Rajkumar Sasidharan Tech Team: Chris Wilks (50%) Bob Muller Larry Ploetz Cynthia Lee Shanker Singh

  42. Partner: Host Institution: Funding Agencies: TAIR Sponsors:

More Related