GLOBALBIODIVERSITY TDWG 2010 Annual Conference 26 Sept – 1st Oct 2010, Woods Hole, USA INFORMATIONFACILITY ‘Beyond DarwinCore: Challenges in mobilizing richer content’ Samy Gaiji (GBIF) Dag Endresen (NordGen) Jonas Nordling (NordGen) Sonia Dias (Bioversity) Elizabeth Arnaud (Bioversity) Milko Skofic (Bioversity) WWW.GBIF.ORG
The GBIF informatics strategy TDWG 2009 Key components Global strategy
From TDWG T-shirt to reality… Objectives of the feasibility study: • Evaluate the GBIF decentralized model/Informatics Suite (aka IPT/HIT/GBRDS) • Explore ways to publish richer content (e.g. phenotypic, genomic, environmental…) • Evaluate scalability in different environments (e.g. bandwidth, IT support, hardware, skills etc…) • Evaluate the potential adoption by a large network (e.g. more than 20 publishers) • …
Scope of the project Conditions • be an active GBIF/TDWG participant. • have a clear understanding of their potential role as an active regional, national, thematic node within the GBIF global network. • have sufficient knowledge of the GBIF tools and related standards (e.g. IPT/HIT/GBRDS, DwC & extensions). • have available in electronic format enriched data beyond the core DwC concepts such as phenotypic, genomic and environmental data. • have a central portal/index similar to GBIF and playing a role of a data hub within the GBIF global network. • demonstrate that it can contribute at least 50% in-kind contribution to the project and that the feasibility study is part of their on-going expansion plan…
Why the genebank community? Dag Endresen Elizabeth Arnaud Jonas Nordling Sonia Dias IPT MilkoSkofic HIT
EURISCO EURISCO is a web-based catalogue that provides information about ex situ plant collections maintained in Europe. Sonia Dias • 1,082,000 accessions • 41 participating countries • 313 institutions/genebanks • >35,000 taxa • Mostly landraces…
EURISCO@GBIF Plant Genetic Resources Network2,036,233 records EURISCO964,250 records
Where we went… 2010 : IPT installations for EURISCO • EURISCO • NordGen (Nordic) • Bioversity-Montpellier (France) • IPK Gatersleben (Germany) • BLE (Germany)* • WUR CGN (The Netherlands) • CRI (Czech Republic) • VIR (Russian Federation) • SeedNET (Balkan)* • Baltic (Estonia, Latvia, Lithuania) • (* in progress) 11
The Good • Strategy: “The GBIF network model is clear and easy to explain” • Installation: “The IPT itself is really easy to install…” • Interface: “The IPT user interface is clear and nice to work with…” • Configuration: “The mapping interface is very easy to understand as well as the automated mapping of files with the DwC terms as column headers…” • Extensions: “The DarwinCore and the extension model is very well received by users.” => It offers great opportunities to expand and cover richer content such as phenotypic, genomic etc… • Impact: “The biggest gain of the project so far is raising interest and awareness about the the GBIF tools/standards”
The Bad • Purpose: “New users sometimes misunderstands the IPT concept and sees it as a fully functional portal aimed at the general public.” • Interface: “The user interface is unnecessarily rich and elaborated for exploring data.” • Installation: “Although installation of the IPT itself is easy, the installation of the Java application server, Tomcat, is more demanding. This is accentuated by the high memory requirements of the current version of the IPT.” • Scalability: “The IPT had problems with large data-sets.” • Usability: “The need to redo mapping to DarwinCore and DwC extension terms when doing a new import in a previously used format is not acceptable to users.”
The Bad • Installations: “Technical problems with Tomcat/IPT have resulted in unfinished installations and or GBRDS registrations at some participating genebanks” • Expertise: “Lack of IT resources have been a problem at some sites” • Stability: “No final release of the IPT has made some genebanks to hesitate in adopting it”. However this may not be such a problem since IPT v1.0 should be released by Q1 2011.
Recommendations to GBIF • Strategy: “Focus should be on the core functionality of the IPT: to serve as an easy to use web-service interface for data publishers.” • Performance: “The suggested removal of the Geoserver and the internal database is a sound choice; it should to improve ease of installation and performance.” – “It is crucial to be able to handle very large data-sets!!” • Functionalities: “It should be possible to save the mapping/configuration, and reuse them later on…” • Extension: The process of creating a DwC extension for a specific community is not difficult in itself. It needs to be promoted and associated with best practices guidelines”
Recommendations to GBIF • Technology: “The advantages of a web-service-based information infrastructure as opposed to manual file uploads, excel are not always evident to data publishers…” • Capacity: “Some important data owners are lacking both infrastructure (computers etc) and personnel (enough access to qualified technicians). Fundraising should be a priority in particular with key foundations.” • Use cases: “Data publishers need clear use-cases to see and understand the advantages of new technologies and to think it worth the trouble to try something new and change things that has been working already.” – “An even better use-case for demonstration and training would have included the NPT and maybe the HIT.”
Recommendations to GBIF • Adoption: “Training for data publishers seems to be the most important factor to get a working network.” • Mentoring: “NordGen has provided funds to support the purchase of a proper server to host the IPT at one of the location but a second visit seems necessary to get things running.” • Richer content: “Based on user feedback, the interest for the DarwinCore but more importantly its extension model seems to be an ideal combination for the PGR community.”
GBIFS: Actions taken • IPT re-engineering: • Reduce the server requirements significantly (e.g. necessary memory) • Increase the data import performance by removing the embedded database • Remove the dependencies on heavy libraries and tools such as Geoserver • Remove all data interfaces that are not necessary for data publication through GBIF • Improve the robustness of the tool • Status • Early testing to commence mid October 2010 • Targeted at general availability release end 2010 Tim Robertson
The reality is… GBIF? IPT? GBRDS?What’s the scientific relevance of all of that? Cool!When can I install it? The scientist The plumber
Scientific relevance!!! SNPs RNAs RFLPs Yield Leaf type Leaf colour DNA-based markers Biomass Drought tolerance Length of peduncle Genetics Fruit shape Flower colour Protein content Seed coat texture Disease resistance Characteristics Performances Passport Plant growth habit Pant height DarwinCore
Examples… • How can I better manage my collection?(e.g. identification of duplicates…) • How can I prioritize my conservation actions? • How can I identify gaps in my collections? • How can I build core collections for breeders? • How can I select material with useful traits?
Examples… • How can I broaden the genetic diversity based on information from other genebanks? • I need to do research on: • Crop diversification… • Eco-geographic studies… • Plant/soil micro-organisms… • Pests/Hosts interactions… • Ex situ/In situ complementary conservation...
Climate change “You will succeed when you will become scientifically relevant to us!” Adaptation Food security Diversity Breeding Evaluation Research Measurements Genetic Resources Useful traits
Senior Programme Officer for Science & Scientific Liaison Email: email@example.com