1 / 38

Data Integration, Analysis, and Synthesis

Data Integration, Analysis, and Synthesis. Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara Scalable Information Networks for the Environment. http://knb.ecoinformatics.org

konala
Download Presentation

Data Integration, Analysis, and Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara Scalable Information Networks for the Environment http://knb.ecoinformatics.org Funding: National Science Foundation (DEB99-80154, DBI99-04777)

  2. NCEAS’ Mission • Integrate existing data for broad ecological synthesis • Use synthesis to inform policy and management

  3. Synthesis at NCEAS • Research • Management • Policy • 200+ synthesis projects • 1900+ participating scientists

  4. Research projects • Hunsaker – Quantification of Uncertainty in Spatial Data for Ecological Applications • Ives & Frost – Intrinsic and Extrinsic Variability in Community Dynamics • Osenberg -- Meta-Analysis, Interaction Strength and Effect Size; Application of Biological Models to the Synthesis of Experimental Data • Murdoch – Complex Population Dynamics

  5. Management projects • Andelman – Designing and Assessing the Viability of Nature Reserve Systems at Regional Scales: Integration of Optimization, Heuristic and Dynamic Models • Boersma & Kareiva – Prospectus For An Analysis of Recovery Plans and Delisting • Kareiva – Habitat Conservation Planning for Endangered Species • Lubchenco, Palumbi, & Gaines – Developing the Theory of Marine Reserves

  6. Policy projects • Costanza & Farber -- The Value of the World's Ecosystem Services and Natural Capital: Toward a Dynamic, Integrated Approach • http://www.nceas.ucsb.edu/

  7. Synthesis projects • Use existing data... • Distributed sources • Varying protocols • Varying formats • Obtained via personal collaboration

  8. Functional breakdown • Functional breakdown for synthesis • Data discovery • Data access • Data storage • Data interpretation • Quality assessment • Data Conversion & Integration • Analysis & Modeling • Visualization

  9. Presentation Outline • Integration, Analysis, and Synthesis: • Challenges

  10. Data Heterogeneity • Economic • Social (urban ecology) • Paleoecological • Historical • Land use • Demographics • Population survey • Experimental • Taxonomic survey • Behavioral • Meteorological • Oceanographic • Hydrology • …

  11. Types of Heterogeneity • Intensional vs. Arbitrary Heterogeneity • Syntax (format) • CSV, Fixed ASCII, proprietary binary • Schema (organization) • Non-normalized models • Semantics (meaning/methods) • Protocol semantics (e.g., scale) • Parameter semantics (e.g., bodysize (g)) • Conceptual framework (e.g., experimental trts) • Taxonomy + nomenclature

  12. Data Dispersion • Data are distributed among: • Independent researcher holdings • Research station collections • LTER Network (24 sites) • Org. of Biological Field Stations (168 sites) • Univ. Cal Natural Reserve System (36 sites) • MARINE (62 sites) • PISCO • Agency databases • Museum databases • Access via personal networking • Not scalable

  13. Lack of Metadata • Majority of ecological data undocumented • Lack information on syntax, schema and semantics of data • Impossible to understand data without contacting the original researchers • Documentation conventions widely vary • Requires large time investment to understand each data set

  14. Scaling Data Integration • Because of: • Data heterogeneity • Data dispersion • Lack of documentation • Integration and synthesis are limited to a manual process • Thus, difficult to scale integration efforts up to large numbers of data sets

  15. Data Integration A B C

  16. Presentation Outline • Integration, Analysis, and Synthesis: • Challenges • Current work • Knowledge Network for Biocomplexity • Partnership for Biodiversity Informatics

  17. Knowledge Network for Biocomplexity (KNB) • National network for biocomplexity data • Data discovery • Data access • Data interpretation • Enable advanced services • Data integration • Analysis framework • Hypothesis modeling • Visualization

  18. Central Role of Metadata • What metadata? • Ownership, attribution, structure, contents, methods, quality, etc. • Critical for addressing data heterogeneity issues • Critical for developing extensible systems • Critical for long-term data preservation • Allows advanced services to be built

  19. KNB Components • Ecological Metadata Language (EML) • Morpho -- data management for ecologists • Cross platform Java application • Metacat -- flexible metadata & data system • Analysis and Modeling engine • Data integration engine • Semantic Query Processor • Hypothesis Modeling Engine

  20. Ecological Metadata Language • XML syntax for representing metadata • Extensible – can add new metadata • Modular – can subset metadata for specific applications

  21. EML 2.0beta3 modules • eml-resource -- Basic resource info • eml-dataset -- Data set info • eml-literature -- Citation info • eml-software -- Software info • eml-party -- People and Organizations • eml-entity -- Data entity (table) info • eml-attribute -- Attribute (variable) info • eml-constraint -- Integrity constraints • eml-physical -- Physical format info • eml-access -- Access control • eml-distribution -- Distribution info • eml-project -- Research project info • eml-coverage -- Geographic, temporal and taxonomic coverage • eml-protocol -- Methods and QA/QC

  22. Metacat metadata system SEV NRS Metacat OBFS AND SEV Metacat NCEAS Metacat CAP LTER Metacat Key Metacat Catalog Morpho clients Web clients SDSC Metacat Site metadata system XML wrapper

  23. Metacat architecture

  24. Metacat web interface

  25. OBFS Network UC Natural Reserve System LTER Network

  26. Functional breakdown • Functional breakdown for synthesis • Data discovery • Data access • Data storage • Data interpretation • Quality assessment • Data Conversion & Integration • Analysis & Modeling • Visualization

  27. Quality Assessment system Data Semantic Metadata Researcher Decisions + + + Quality Assessment Report

  28. Quality Assessment • Integrity constraint checking • Data type checking • Metadata completeness • Data entry errors • Outlier detection • Check assertions about data • e.g., trees don’t shrink • e.g., sea urchins do

  29. Data Integration Data Semantic Metadata Researcher Decisions + + + Integrated Data Set

  30. Data Integration A B C

  31. Scaling Analysis and Modeling

  32. Scaling Analysis and Modeling

  33. Semantic metadata • Describes the relationship between measurements and ecologically relevant concepts • Drawn from a controlled vocabulary • Ontology for ecological measurements

  34. Ecological Ontologies

  35. What drives synthesis • Science questions • Hypotheses • Analyses + Models • Integrated Data • Original Data

  36. Conclusions • Barriers to integration can be addressed using structured metadata • Can accomplish a lot with ‘just’ mechanical transformations • Domain ontologies + semantic mediation are paths to scaling integration • Analysis drives all other phases of integration

More Related