1 / 38

ChEBI: The story so far

ChEBI: The story so far. Paula de Matos. Private Data. Public Data. The state of affairs of bioinformatics in 2002. Bioinformatics is booming Human Genome sequence rough draft published June 2000 Free resources and free data. A different story for chemoinformatics.

Download Presentation

ChEBI: The story so far

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ChEBI: The story so far Paula de Matos

  2. Private Data Public Data ChEBI: The story so far

  3. The state of affairs of bioinformatics in 2002 • Bioinformatics is booming • Human Genome sequence rough draft published June 2000 • Free resources and free data ChEBI: The story so far

  4. A different story for chemoinformatics • Private data and private software ChEBI: The story so far

  5. Too hard to solve… lets put our head in the sand ChEBI: The story so far

  6. Bioinformatics data too large to keep track of chemical compounds • 100000 Protein entries in SwissProt (2002) • 20 million entries in EMBL Database (2002) • Small databases unable to keep track • ENZYME resources ~ 3500 enzymatic reactions ChEBI: The story so far

  7. New initiatives start up • PubChem • Chemical repository, millions of entries, focus on screening assays • ChEBI • Manually annotated database, nomenclature reference and compound database, tens of thousands of entries ChEBI: The story so far

  8. 2002 2003 2004 2005 2006 2007 2008 Principles of foundation • December 2002 email exchanges within the EBI to address the issue of chemistry • Three principles outlined ChEBI: The story so far

  9. “Nothing held in the database must be proprietary or derived from a proprietary source that would limit its free distribution/availability to anyone.” ChEBI: The story so far

  10. “Every data item in the database should be fully traceable and explicitly referenced to the original source/version.” ChEBI: The story so far

  11. “Although the EBI will provide a web interface, the entirety of the data should be available to all without constraint as, for example, SQL table dumps, ASCII tables, and XML (e.g. DAML+OIL)” ChEBI: The story so far

  12. 2002 2003 2004 2005 2006 2007 2008 We make a start using existing resources • Integratethree resources • KEGG Compound • IntEnz • Chemical Ontology • Annotation starts summer 2003 • Focus on nomenclature ChEBI: The story so far

  13. 2002 2003 2004 2005 2006 2007 2008 Our first release was modest but it was a start • 21 July 2004 • 2783 annotated entities • Data: • ChEBI Name, ChEBI Id • IUPAC Names, Synonyms • Formula • Cross-references ChEBI: The story so far

  14. 2002 2003 2004 2005 2006 2007 2008 We introduce structures - Sep 2005 • Molfiles • InChI (IUPAC International Chemical Identifier) • SMILES (Simplified Molecular Input Line Entry System) • Image (PNG) ChEBI: The story so far

  15. Marvin in ChEBI ChEBI: The story so far

  16. 2002 2003 2004 2005 2006 2007 2008 We start editing the chemical ontology – Dec 2005 ChEBI: The story so far

  17. 2002 2003 2004 2005 2006 2007 2008 Internationalisation of web pages – March 2006 ChEBI: The story so far

  18. 2002 2003 2004 2005 2006 2007 2008 Internationalisation of data – Feb 2008 ChEBI: The story so far

  19. 2002 2003 2004 2005 2006 2007 2008 Web Services - Oct 2006 • Programmatic access to a ChEBI entry • SOAP based Java implementation • Clients currently available in Java and perl • Four methods with which to access data • getLiteEntity • getCompleteEntity • getOntologyParents • getOntologyChildren ChEBI: The story so far

  20. 2002 2003 2004 2005 2006 2007 2008 Automated Cross References – Aug 2007 Current Databases: UniProtKB, Reactome, BioModels, IntAct, SABIO-RK, PubChem and ArrayExpress ChEBI: The story so far

  21. 2002 2003 2004 2005 2006 2007 2008 Chemical Structure Searching – May 2008 ChEBI: The story so far

  22. After all this, where are we? ChEBI: The story so far

  23. ChEBI: The story so far

  24. ChEBI: The story so far

  25. Annotation is linear ChEBI: The story so far

  26. Number of web hits grows • Total pure entry hits in April: 42,612 / 273,219 • Total web services hits in April: 88,226 • Web hits for 2007: ChEBI: The story so far

  27. D I V E R S I T Y Diversity of users Constant challenge of balancing our users' varied interests. ChEBI: The story so far

  28. Our positives • Nomenclature database • Manually annotated data • Attention to detail • Free and accessible • Loyal users ChEBI: The story so far

  29. Our not so positives • Size for some people • Not well integrated into other bioinformatics resources • Community interaction • No software publicly available to manipulate the database ChEBI: The story so far

  30. Involve the community • Create a submission web based tool • Users can easily submit their entities on a one to one basis • Also allowing bulk submission from other resources. ChEBI: The story so far

  31. Improvements to data depth • Addition of more Xrefs: PDB, MACIE ??? • Addition of more chemical attributes? What chemical attributes? • Text mining projects to extract relevant chemical information from patents, journals • European Patent Office ChEBI: The story so far

  32. Going Open Source • Commercial software packages will be replaced with Open Source • Long term goal: allow people to create a free local instance of ChEBI • Distribution of data in useful formats: CML, SDF ChEBI: The story so far

  33. Proposed changes to the ontology • New relationships • “Is disjoint from” ChEBI: The story so far

  34. Is alloprote of succinate(2−) CHEBI:30031 succinic acid CHEBI:15741 Is alloprote of ChEBI: The story so far

  35. Has biological role Has biological role and Has application ChEBI: The story so far

  36. CHEBI:15422 C10H16N5O13P3 CHEBI:16027 C10H14N5O7P CHEBI:16761 C10H15N5O10P2 Encourage use of ChEBI nomenclature • Currently working with the Swiss Institute of Bioinformatics building a database of biochemical reactions called Rhea • All reactions mapped to ChEBI EC 2.7.4.3 “ATP + AMP = 2 ADP” ChEBI: The story so far

  37. Acknowledgements • IntEnz Team • Rafael Alcantara, Volker Ast, Kristian Axelsen, Anne Morgat • EPO Collaborators • Helene Courrier, Stephane Nauche, Jeremy Parsons • Database supporters • ArrayExpress, IntAct, Reactome, SABIO-RK, RSC, GO, RESID etc… • ChEBI Team • Paula de Matos, Kirill Degtyarenko, Marcus Ennis, Janna Hastings, Christoph Steinbeck • Alumni • Michael Darsow, Mickael Guedj, Alan McNaught, Martin Zbinden • ChEBI supporters • Rolf Apweiler, Michael Ashburner, Henning Hermjakob, Janet Thornton ChEBI: The story so far

  38. Discussion Points Data Depth Community New Relationships Encourage Nomenclature ChEBI: The story so far

More Related