320 likes | 693 Views
Natural History Collections Infrastructure Ricardo Scachetti-Pereira The University of Kansas Biodiversity Research Center & Natural History Museum KU-BRC/NHM Biological Collections
E N D
Natural History Collections Infrastructure Ricardo Scachetti-Pereira The University of Kansas Biodiversity Research Center & Natural History Museum KU-BRC/NHM
Biological Collections • Natural History Museums, Herbaria and Culture Collections provide fundamental resources for biodiversity and ecological research • Development of IT infrastructure to make results of those services (specimen data) more accessible
Collections Data Integration • Integrate Common Information • Scientific Name, Taxonomy • Geography, Locality, GPS coordinates • Collection Events (Collector) and other information • Geographically Distributed: • Birds of Mexico: spread over 43 institutions around the world; main holder had only 16% of total specimens • Heterogeneous Hardware & Software • Database Vendors (Access, Oracle, SQL Server) • Database Schemas (Table Definitions) • Software (Specify, Biota, In-house, etc)
Collections Data Integration • Distributed Generic Information Retrieval (DiGIR) • XML based protocol for retrieving structured data from multiple, distributed, heterogeneous databases over the Internet
DiGIR Protocol • Portal (UI) builds XML query • Portal broadcasts XML query to providers • Each provider translates XML query into native SQL query (database schema) • Provider translates results into XML result set and send it back to portal • Portal integrates answers from various providers into a single, homogeneous result table
Applications of Collections Data: Methods • Prediction of Species’ Distributions: • Looks for non-random correlations between point occurrence and environmental conditions • Genetic Algorithms (GARP) • Artificial Neural Networks (ANN) • Generalized Linear Models (GLM) • Generalized Additive Models (GAM) • Many, many others
Applications of Collections Data: Examples • Prediction of Species’ Actual and Potential Geographical Distribution • Invasive Species • Spread of Diseases • Evolutionary Biology • Management • Monitoring
Applications: Predicting the distribution of filovirus disease Slide by A. Townsend Peterson (KU)
Ecological Data Integration • Specimen data currently integrated • Other data required for analysis: • Climate, Relief, Land Cover & Use, Remote Sense, etc. • Acquisition, processing and integration is still largely manual
Ecological Data Integration:The SEEK Vision • Science Environment for Ecological Knowledge (SEEK) • Partners: NCEAS, UNM, SDSC, KU
SEEK: Objectives • Provide access to biodiversity, ecological and environmental data (discovery, sharing and reuse) • Provide scalable and streamlined framework for analysis and synthesis • Use Semantic Mediation to integrate heterogeneous data and analytical steps
SEEK: Overview Slide by Chad Berkeley (NCEAS)
SEEK: Ecogrid • Integrate diverse data networks from ecology, biodiversity and environmental sciences • XML based language used for data documentation • Access to computational resources via the Grid Slide by Matt Jones (NCEAS)
SEEK: Data Integration Slide by Matt Jones (NCEAS)
SEEK: Analysis & Modeling Slide by Matt Jones (NCEAS)
SEEK: Taxonomic Object Service Elliot 1816 R. plumosa • Taxon concepts change over time (and space) • Multiple competing concepts coexist • Names are re-used for multiple concepts Gray 1834 R. plumosa Rhynchospora plumosa s.l. R. Plumosa v. intermedia R. plumosa v. plumosa Chapman 1860 R. Plumosa v. interrupta R. intermedia Kral 1998 R. pineticola R. plumosa Peet 2002? R. plumosa v. pinetcola R. plumosa v. plumosa R. sp. 1 A B C Slide by Bill Michener (UNM) Information by Robert Peet (UNC)
SEEK: Road map • Now into the 2nd year (out of 5) • Working prototypes for: • Ecogrid + Kepler (UI) • Semantic Mediation System • Taxonomic Object Service
Role of Collections in NEON • Provide fundamental services for biodiversity and ecological research and monitoring • Collections count on IT infrastructure to provide valuable information to NEON • Will be seamlessly integrated to other relevant sources of data
Integrating Collections into NEON • Rate and amount of deposits limited by: • Physical Installations (Storage Facilities) • Personnel (Allocation and Training) • Preservation/Storage Processes • Computerization Process • Require proper allocation of resources to function as part of a monitoring facility