1 / 25

BIS TDWG Conference, New Orleans, 2011

BIS TDWG Conference, New Orleans, 2011. GBIF: Issues in providing federated access to digital information related to biological specimens. David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF). 3 issues. Issue #1: The consequences of scale.

burchard
Download Presentation

BIS TDWG Conference, New Orleans, 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF)

  2. 3 issues Issue #1: The consequences of scale Issue #2: Geospatial integration Issue #3: Taxonomic integration

  3. Issue #1: The consequences of scale Goal – Provide timely access to a large federated network of biodiversity databases

  4. About GBIF The mission of the Global Biodiversity Information Facility (GBIF) is to facilitate free and open access to biodiversity data worldwide via the Internet to underpin sustainable development. • 341 publishers • 9290 datasets • 310M records • 57 countries • 45 organisations

  5. “Wrapper” Software Install one of these ‘wrappers’ Data Your database ABCD PyWrapper (Python) Herbarium DarwinCore TAPIR Link (PHP) Bird Observations DarwinCore DiGIR (PHP) Insect Collection

  6. The promise of federation Any specimens from Thailand? I will ask! GBIF Data Portal I do! I do! I do! Nope! Insect Collection Herbarium Bird Observations Herbarium GBIF Data Portal as a Gateway

  7. The challenge of federation Hello? GBIF Data Portal Server Not Available Hi! Insect Collection Herbarium Bird Observations Herbarium

  8. The rise of Indexing Any data records from Thailand? Send me a copy of your data GBIF Data Portal (now with Data!) Insect Collection Herbarium Bird Observations Herbarium GBIF Data Portal as a Data Index

  9. The wrong tools for the job Any data records from Thailand? Send me a copy of your data once per month GBIF Data Portal (now with Data!) If I go offline, start again You ask the same questions every time Here is page one. Not too fast! Insect Collection Herbarium Bird Observations Herbarium

  10. TAPIR request example • dataset of 260,000 specimens • 200 records retrieved per request • requires 1300 request/response pairs • over 9 hours to complete • 500 MB of XML data is transferred • becomes 32 MB text file in the GBIF server • 32 MB is compressible to 3 MB zip file

  11. Darwin Core Archives A text-based solution to publishing biodiversity data

  12. A Refined Approach Any data records from Thailand? This is fast! GBIF Data Portal (now with Data!) - reduce latency This is easy - index very large data sets URL URL URL URL Insect Collection Herbarium Bird Observations Herbarium

  13. Growth 302 million Newstandardadopted Need for a new standard identified 201 million 180 million 147 million 70 million 2007 2008 2009 2010 Today

  14. Issue #2: Geospatial Integration Goal – Provide accurate reporting of nationally-bound data Challenge – Inaccurate recording of geospatial coordinates

  15. Geo-referenced USA data Verbatim data as shared on the network

  16. Issue #2: Geospatial Integration Remediation includes: • Use of country boundary shapefiles to verify that coordinates fall within them • Including EEZ boundaries • Including islands • Outliers identified • Nature of the error qualified (e.g., “coordinates inverted”) • Offending records marked and omitted from display

  17. Geo-referenced USA data Data following interpretation • Coastal regions recognised • Offshore islands recognised

  18. Issue #3: Taxonomic Integration • Goal – Provide access to biodiversity data according to taxonomic groups and concepts • Challenge – • Heterogeneous and sometimes inaccurate classification • Same taxon appearing in different classifications • Presence of homonyms that complicate reconciling above • Misspellings • Wide range of orthographies for the same name

  19. Enabling authoratative taxonomic data to be published through GBIF

  20. Trochilidae (Hummingbirds)(today) Misinterpretations (Hummingbirds are restricted to the Americas)

  21. Trochilidae (Hummingbirds)(next month) Improved interpretation

  22. Search for Oenanthe(water dropwort plantorwheatear bird) resolution of homonyms Today Difficult for user to interpret Next month Accurate search results

  23. Improved means to match names to authority files

  24. In summary • GBIF has had to deploy different data access strategies in order to effectively scale • Darwin Core Archive offers a scalable solution that has led to rapid growth in data published through GBIF • Geospatial filtering via shapefiles provides basis for more accurate national reporting • Basis for additional services later (e.g., ecosystem shapefiles, protected areas, etc.) • Heterogenous taxonomy inherent to collections data is nearly impossible to consolidate into a taxonomically accurate structure. • Comprehensive authoritative taxonomic data is a key organisational component of collections data

  25. Thank you

More Related