230 likes | 297 Views
The updated version 3.0 of the GBIF Data Repository Tool aims to facilitate data custodians in managing and sharing their data easily. It simplifies data warehousing and ensures data availability while respecting IPR and confidentiality.
E N D
WWW.GBIF.ORG GLOBALBIODIVERSITY INFORMATIONFACILITY The GBIF Data Repository Tool (New updated version 3.0) Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen, 2005-09-15/18
Outline • Objectives and background • Design and installation • Use • Demonstration
Challenges in data sharing • Eventually, all data sets become orphans: Archiving services are a necessity. • The concept ”share once, use many” requires available data repositories. • Data from archives must be available using standard mechanisms to portals such as GBIF. • IPR, confidentiality, and benefit sharing must be respected at all times.
Goals of the GBIF Data Repository Tool • Enable data custodians to manage their data and control its publishing. • Provide mechanism such that spreadsheets, etc., can directly be used for sharing data • Hide the database complexities from users • Make available a simple data warehouse tool for those who want to host datasets for the community • I.e., lower the threshold of data sharing as low as possible.
Functionalities • Data must be formatted according to the Darwin Core standard and its extensions in flat spreadsheet format. • In fact, any flat format will work (rows, columns) • The system will check and parse the data into embedded MySQL database that becomes available to the public as a DiGIR/TAPIR resource. • Owners can control the level of detail released: • Fuzzying of geographic coordinates is available • Collector names and time periods can be hidden • Approval of terms and conditions for data use can be required • Owner can revoke release and update data. • Metadata can be inherited to data to replace missing values as defined. • Includes an embedded image server
Installation • For Linux and Windows • Based on Python, Zope 2.10 and MySQL • Supports the DiGIR and TAPIR protocols of TDWG • Turn-key installation • Fits with directly into the EC CHM software package
Steps for data owners • Prepare the data files • Create a nested folder structure on the Repository for the collection • Enter default metadata scope (to cover missing values in data, etc.) • Decide on access policies • Upload the files • Publish the data files
Create the resources (databases) of the collection and folders
Access policy • options: • Fully open • Standard GBIF policy of acknow-ledgements • No direct download and fuzzying for web service access
Data is now searchable locally and through the DiGIR/TAPIR protocols