1 / 40

Scientific Database Approaches

Scientific Database Approaches. John H. Porter University of Virginia & Kristin Vanderbilt University of New Mexico. Road Map. Why have Scientific Databases? Challenges for Scientific Databases Approaches to Scientific Databases Strategies for Initiating Ecological Databases.

amil
Download Presentation

Scientific Database Approaches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Database Approaches John H. Porter University of Virginia & Kristin Vanderbilt University of New Mexico

  2. Road Map • Why have Scientific Databases? • Challenges for Scientific Databases • Approaches to Scientific Databases • Strategies for Initiating Ecological Databases

  3. WHY have Scientific Databases? • Improvement of data quality • multiple users provides multiple opportunities for detecting and correcting problems in data • Cost • data costs less to save than to collect again • with environmental data, often data cannot be collected again at any cost

  4. WHY have Scientific Databases? • Environmental Policy and Management • environmental policy decisions require data that are regional or national, but most ecological data is collected at smaller scales • National Policies • International Policies

  5. WHY have Scientific Databases? • New Science • Long Term • long-term studies depend on databases to retain project history • Synthesis • use of data for a purpose other than which it was collected • Integrated, multidisciplinary projects • depend on databases to facilitate sharing of data

  6. Data Collection Use Data Lose or Discard Data Publications Evolution of Data Sharing- Traditional Model

  7. Evolution of Data Sharing –New Model Data Collection Use Data Data and Metadata • Regional Analyses • Global Change • Long-term Studies • Synthesis Publications

  8. Challenges for Scientific Databases • Long-term perspective • without databases, most data do not outlive project that collected them • The 20-year rule • GOAL: data that is accessible and interpretable 20-years in the future

  9. Meeting Long Term Needs • TECHNOLOGICAL – media & formats that do not become obsolete • CONTEXTUAL- need to capture context of data collection • SEMANTIC - terms need to be well-defined

  10. Challenges for Scientific Databases • Deal with Diversity • science means asking NEW questions • new kinds of queries • scientific data is heterogeneous and diverse • scientific users have different backgrounds and goals • the user community for a given database will be dynamic

  11. Satellite Images High GIS Weather Stations Most Ecological Data Data Volume (per dataset) Business Data Biodiversity Surveys Primary Productivity Most Software Population Data Gene Sequences Soil Cores Low High Complexity/Metadata Requirements Characteristics of Ecological Data

  12. Comparison to Business Databases • Business-oriented databases have been very different from scientific databases • Relatively small number of well-defined data elements • E.g., Part number, count, price • Repeatable reports (e.g., sales report) • Rules for integrating data well understood • Intolerant of different values associated with an element • E.g., hourly rate of pay

  13. Ecoinformatics Ecoinformatics Development:Alignment with IT community Information Technology Reason: IT focused on proprietary business applications modified from James Brunt

  14. Changing Times • New emphases on “data mining” are forcing business databases to become more like scientific databases • Example: data on customer demographics are linked to regional store inventories • Integration of data resources not designed with integration in mind

  15. XML, Web Services, Semantic Mediation Ecoinformatics Development:Alignment with IT community IT Ecoinformatics Reason: IT now focuses on domain-neutral access to distributed data products. Modified from James Brunt

  16. The Ecoinformatics Challenge: • Can we make information available to ecologists: • In ways they canlocate the information they need? • With information in forms they can readilyuse? • How can we assure that the information is current and accurate?

  17. Not all Scientific Databases are Alike! Scientific data are available at a number of different “levels” • LOW: individual investigator posts data on web page for students to retrieve • MEDIUM: Online databases for supporting a project • HIGH: system automatically integrates data from a large number of sources

  18. “Portal”, “Value-Added” or “Integrated” Infobases Researchers International/ National/Regional Systems Project or Site-Based Systems Individual datasets Different types of Scientific Databases

  19. Tools for Creating Scientific Databases • Web Server – HTML, XML • IIS • Apache – open source • Database Management Systems (DBMS) • Input, query, update, sort, output • Statistical Packages • Aggregate, graph • Programming Languages • C++, JAVA, PERL, Python, Visual Basic, PHP • Create Custom code

  20. Tools for Scientific Database Development • Relational Database Management Systems – RDBMS in common use • Access/ Microsoft SQL Server • Oracle • MySQL – open source • Statistical Packages • SAS • SPSS • R – open source

  21. Spreadsheets • Spreadsheets are fantastic tools – but not for scientific databases! • Encourage “bad practice” – irregular data structures that can’t be parsed easily • Lack “auditability” – difficult or impossible to back-track calculations • Proprietary formats become obsolete • Lack export capabilities for other than values or graphs (no formulae)

  22. Not Every Scientific DB needs or uses the same tools • Example 1 – Basic Data Access • Post comma-delimited files on web server • Metadata files – XML text files (structured) or unstructured • Example 2 – Add Products • Use SAS to conduct error-checking and generate graphics from data • Use scripts/programs to automate production process

  23. Possible Systems • Example 3- Manage Metadata in DBMS • Metadata in Access Database • Provide comma-delimited data files • Example 4- Manage Metadata on Web • Link web forms to backend DBMS • Example 5- Full DMBS system • Metadata in DBMS • Data dynamically queried from DBMS using web interface

  24. Level of Structure • Unstructured Data/Metadata • Easy to produce • Hard to use • Structured Data/Metadata • Harder to produce • Easy to export, alter, update • The specific tool used to structure data (e.g., XML, DBMS) is increasingly less critical than the structure itself

  25. Evolving a Database • Development of a database is an evolutionary process • Implement system based on current priorities - but think ahead! • Seek scalable solutions • avoid bottlenecks • adding the 1000th piece of data should be as easy as adding the first (or easier)

  26. Developing a Database - Questions to Ask • Why is this database NEEDED? • Who will be the USERS of the database? • What types of QUESTIONS should the database be able to answer? • What INCENTIVES will be available for data providers?

  27. Meeting the Challenges • Prioritize • focus on developing the most critical data resources • most commonly, critical data refer to the research site as a whole • Meteorology & Climatology • Bibliography of past research at the station • GIS data layers for the station research area

  28. Meeting the Challenges • Get additional resources • NSF Grants • Upcoming NSF initiatives: • SEI+II – interdisciplinary research • National Ecological Observing Network (NEON) • Institutional Support

  29. Meeting the Challenges • Work with researchers and enlist their help in developing ecological databases • Develop policies for data collection and sharing that dictate the responsibilities of: • The data provider/producer • The data system • Users of the data

  30. Use Standard Methods when Possible • Advantages of using standard methods • Increases intercomparability (and hence, value) of data, facilitating cross-site comparisons • Reduces cost of methods development

  31. Standards • Costs of using standards • Standard methods may be poorly suited to local conditions • Developing standards is time consuming and difficult • For some types of monitoring, standards may not exist, or may do a poor job characterizing desired parameters

  32. Standards “The wonderful thing about standards is that there are so many of them to choose from” • Sources of Standards • Published literature • Government Agencies (e.g., USGS, EPA) • Project standards (e.g., LTER Climate Stations) • Resource Discovery Initiative for Field Stations (RDIFS) directory (under development)

  33. Information Systems • Developing an information system is a critical component of research • You can’t exploit data you no longer have! • Creating good “metadata” (data about data) is crucial to maintaining data usability over time

  34. Exploit Partnerships & Existing Resources • OBFS Resource Discovery Initiative for Field Stations (RDIFS) • Ecoinformatics Training • Publications Database • Registry for field station data (free advertising!) • Database of standards • Keyword Thesaurus • Ecoinformatics.org/ Knowledge Network for Biocomplexity Project • Ecological Metadata Language • Tools

  35. Ecological Metadata Language (EML)

  36. Other Possible Collaborations • ORNL Mercury System • Cataloging and metadata tools with the data and metadata left on your system • Global Change Master Directory • online system for metadata with searching capabilities • OpENDAP.org • Online tools for oceanographic data

  37. Exploiting External Resources • Ecological Society of America journal Ecological Archives • accepts “data papers” for major and important data sets.

  38. Concluding Thoughts • Developing ecological information systems seems a daunting task • Every system starts somewhere. Even oaks start with acorns! • Once started, you can build on successes, a little at a time • Remember, the compound interest on zero is zero!

  39. Next Step • Experience is a good guide to helping build the sort of database your users will want to use • Its good to try out the existing systems to see what works (and what doesn’t) as a user

More Related