1 / 24

WP3 Bioinformatics communities committee WP3bc

WP3 Bioinformatics communities committee WP3bc. Bengt Persson, May 2009. WP3bc members. Bengt Persson, chair Nick Goldman, co-chair. Wide user representation. Close to all European countries Several fields of bioinformatics Both developers and applied users

whitley
Download Presentation

WP3 Bioinformatics communities committee WP3bc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP3 Bioinformatics communities committeeWP3bc Bengt Persson, May 2009

  2. WP3bc members Bengt Persson, chair Nick Goldman, co-chair

  3. Wide user representation • Close to all European countries • Several fields of bioinformatics • Both developers and applied users • Close contacts with the user survey group

  4. Users’ needs • In today’s life science, bioinformatics databases and tools provided by EBI and other institutes play a central role. • Most researchers take the availability of these databases and tools for granted. • WP3bc finds that there is a clear need for a pan-European infrastructure for biological information, as envisioned by ELIXIR. This infrastructure should provide sustainable access to databases and tools of interest for life sciences and related disciplines.

  5. Bioinformatics status in Europe • Development of research and support differs much between different European countries. • Established national bioinformatics infrastructures in some countries. • Others are currently building up these structures. • In some countries, only few bioinformatics groups in the build-up phase. • In Europe, EBI plays a central role providing a large number of bioinformatics services, including access to databases and tools. • Educational programmes in bioinformatics on different levels are present in most countries.

  6. Data Tools Standards Integration Support Interactions with other infrastructures Priority decisions Recommendations WP3bc Users’ needs

  7. Data • Well annotated data are crucially important in today’s biosciences and this is one of the important tasks for ELIXIR. • Examples are projects that are too big for individual institutions to deal with, e.g. • ENSEMBL (EBI and Sanger Institute) • EMBL-Bank (EBI, DDBJ and NCBI) • UniProt (EBI, PIR and SIB) • public ownership of the human genome

  8. Examples of database subjects • Genes • Proteins • Protein interactions • Array data • Metabolomics • Ontologies • semantic, interoperability • open source variant of SNOMED • Literature • structured variant of PubMed • Patent data

  9. Emerging needs • Imaging • Chemo-genomics • Non-coding RNAs • Primary data from large scale sequencers • Clinical data? • Documentation on data analysis / workflow • Electronic lab books? • … Needs for being proactive and innovative regarding new data types

  10. Data resources needed • To provide and maintain databases with primary data originating from Europe • One of the fundamental goals of ELIXIR • Cost of data storage just a small fraction of the cost for initially obtaining the data • To provide mirrors of other databases of users’ interest • Mainly a practical issue for maximum convenience • Speed-up for highly accessed databases • To provide database structures etc. that can be taken and used locally by others • Would make it easier for labs to properly store their data • Would allow a smoother data submission to the primary database • Local developments/extensions could eventually also be incorporated in the primary database (e.g., similar to an open source project)

  11. Data collection and quality • Mechanisms are needed for rational data collection and its entry into databases including quality controls, standardisation etc. • Standardisation issues need to be coordinated on the European level and when possible also on a world-wide basis. • The amounts of data in life sciences are increasing rapidly. • Currently, huge data amounts are generated by deep sequencing and imaging. • In the future, it can be foreseen that additional techniques will produce even more data. • Deposition of data in centralised databases should be mandatory for publication. • Data quality – how to decide what is good quality data or not.

  12. Documentation of data • There are needs for proper documentation of data, including e.g.: • Data and annotation should be complete. • Database fields properly documented. • Clear definition of annotation terms used in databases. • Clear description of the database scheme. • Clear versioning of databases. • Clear description of biological controls used in experiments and description of experimental protocols followed to produce the data. • ELIXIR could be the body containing structures that would ensure that these practices are adhered to across Europe.

  13. Tools • A number of bioinformatics tools exist that are of central importance for analysis of biological data. • Currently, no mechanisms for maintenance, when the tools are mature and transfer from the development phase to the maintenance phase. • WP3bc suggests that the maintenance of tools should be one of the tasks for ELIXIR. • Mechanisms for prioritisation should be established (cf. below).

  14. Tools, cont. • Tools need to be made available in at least two ways • one user-friendly for human interactions and • one machine-friendly for programmatic access. • Needs for making the tools available for local usage by download of the source code or binaries • There might also be need for mirror sites for some tools to increase the capacity • Benchmarks

  15. Standards • There is a need to coordinate standards • data formats • data structures • interoperability formats

  16. Integration • Integration with neighbouring fields / other infrastructures in the life science areas, • e.g. integration with non-genomics/molecular databases such as clinical data, images, plant phenotypes • Integration between tools • e.g. EMBRACE and further developments • Integration between databases • e.g. links, SRS and similar efforts

  17. Support • There are also needs for bioinformatics support in “the daily work of scientists”. • This local support is most likely best organised via the national bioinformatics infrastructures. • Needs for education for the trainers of each country. • Needs for a network of specialist competences within Europe.

  18. Interactions with other research infrastructures • Infrastructures within life sciences • Computational infrastructures

  19. Infrastructures within life sciences • ELIXIR will cover a wide area of biological information sources • genes, proteins, protein interactions, array data, metabolomics, further 'omics', image data, chemo-genomics, ... • ELIXIR should have adequate contact surfaces towards other large infrastructures within the area of life sciences, e.g.: • INSTRUCT, Integrated Structural Biology Infrastructure • Infrafrontier, Infrastructure for Phenomefrontier and Archivefrontier • EATRIS, The European Advanced Translational Research Infrastructure • BBMRI, European Biobanking And Biomolecular Resources • ECRIN, Infrastructures For Clinical Trials And Biotherapy • future ESFRI infrastructure in systems biology • A proper division of responsibilities between the different infrastructures is of importance for clarity and for avoiding unnecessary duplication of efforts.

  20. Computational resources • Computations resources are needed for updates of databases and integration of databases, e.g. annotation purposes, Interpro runs. • Pilot studies in WP 13.2 has shown that the computational demands are large but clearly feasible using a number of European supercomputer centres.

  21. Priority decisions • Mechanisms are needed to prioritise which databases and tools should be supported. • Demands from the community • Importance, e.g. annotation • Scientific quality

  22. Organisation • Centralised or distributed • From a user’s perspective, we think that a distributed infrastructure would serve the needs optimally. • Spreads the network loads. • Enables direct involvement in the infrastructure from the participating countries. • Allows for successive growth according to increased needs. • Spreads the risks in case of computer centre disaster. • With a distributed solution, coordination is crucial. ELIXIR would therefore need a proper coordination.

  23. Recommendations from WP3bc • Need for infrastructure • WP3bc finds a clear need for a pan-European infrastructure for biological information • Witnessed both from the users’ representatives in WP3bc and from the user surveys. • Need to consider different needs in different countries • Need for provision of databases within life sciences • sustainable access to the databases that are needed in life sciences • In particular, data generated within Europe need to be made sustainably available • Need for a plan for long-term maintenance of computational tools • Create mechanisms for long-term maintenance of bioinformatics tools • both user-friendly interfaces for human interactions and • machine-friendly interfaces for programmatic access. • Need for standards for formats and integration • Increased integration of databases, tools and between infrastructure domains • Need to provide mechanisms for prioritisation of need for resources

More Related