1 / 19

Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group

Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation and Integration of Geo-Data. Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group. Discussion Prompt.

ira
Download Presentation

Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation and Integration of Geo-Data Summary Report fromThursday, 3 March 2011 Pine Room Data Integration Breakout Group

  2. Discussion Prompt In your view/experience what parts of data integration implementations/applications or frameworks are well established (or not) in your discipline(s) and what are the common gaps? Moderator: Cyndy Chandler (WHOI, BCO-DMO) Rapporteur: Chris Mattmann (NASA JPL, USC) Discussion notes kept at TWC hosted titanpad site

  3. Participants • Bob Arko (Lamont-Doherty Earth Observatory) • Joanne Luciano (TWC, RPI) • Anna Milan (National Geophysical Data Center) • Bob Simons (NOAA) • Brian Wee (NEON, Inc.) • Leslie Hsu (LDEO) • Roland Viger (USGS) • James Wilson (James Madison University) • Tom Narock (NASA/GSFC) • Cathy Constable (SIO, UCSD) • Ruth Duerr (NSIDC) • YooriChoi (CUAHSI) • Lee Allison, Arizona Geological Survey • Erin Robinson (ESIP) • KavithaChandrasekar, Indiana University • Bob Detrick (NSF) • Clifford Jacobs (NSF) • Leonard Jonson (NSF)

  4. Data Integration • What does that mean? Combining more than one data source into a single data object. Different from display of multiple data sources in a single view. Example: a database join Time series data sets made up of a variety of sources of data often require data integration. Data aggregation and interoperability are related concepts. Group did not come to consensus.

  5. Geo Disciplines Represented • Geology • Hydrology • Oceanography • Geophysics • Geography • Marine geology and geophysics • Space science • Air quality • Computational neuroscience • Multi-disciplinary or discipline-agnostic: data management, computer science and archive

  6. Geo-Data Integration • What aspects are well established or not? • Identify common gaps?

  7. For many projects, two common themes emerged as being associated with some level of success in ability to do data integration: • ‘long-term’ commitment of funding support • Active engagement of funding managers Examples: Unidata (Atmospheric Sciences) CUASHI (Hydrography) IRIS (Earthquake) US JGOFS, US GLOBEC, US WOCE (Ocean Sciences) ODP (Ocean Drilling)NEON

  8. Support for Data Integration Development of community of practice • Infrastructure to foster communication (workshops) • Mentoring of students and early career PIs • Development of tools (e.g. Unidata developed NetCDF which has been adopted by many communities) • Education and training • The persistence and recognition of a ‘named’ community can enable funds to flow from some agencies to researchers

  9. Support for Data Integration • Some communities agreed on common data formats that facilitated data integration • Pressures from funding agencies or community needs resulted in common software tools • Some communities identified ‘primary’ or ‘core’ variables (e.g. common, essential measurements)

  10. Summary • ‘Long-term’ funding support enables development of a community-of-practice that fosters communication, education and training, development and adoption of common tools and identification of core measurements. Communities-of-Practice can divide up the labor and work collaboratively to address shared challenges (economy of scale).

  11. Additional Observations • Tension between local and global (single PI to coordinated project to national to international). An awareness of global use of data could help with subsequent data integration. • Early planning/specs for data management are important but traditionally difficult to obtain funding.

  12. Gaps • Lack of awareness/understanding that keeping data ‘alive’ (usable) is not free • Many people think data stewardship and data preservation are "solved problems” (not). • "bit level preservation" has been solved, but what is the useful lifespan of those files? What effort is required to make the archived data compatible with all the latest tools and technology. Ability to use a dataset declines over time, without continuing and ongoing attention to ensure that it's still meeting the current access requirements.

  13. Gaps • Historical or legacy data (originating PI is no longer active in the research community) • no national policy for scientific preservation • different disciplines have different interpretations of features in a dataset • Lack of guidelines for best practices regarding metadata required to document model results* software, methodology, inputs, outputs, etc

  14. Gaps • Misconception that you create metadata one time, and it's forever good • not a true statement • somehow the metadata needs to be updated • systems and the infrastructure need to support this • metadata needs to evolve over time

  15. Suggestion Group agreed that ESIP would be an appropriate community in which to continue these discussions and start to do some much needed planning and cross-disciplinary solutions needed to address the gaps and improve infrastructure for geo-data integration.

  16. Additional Comments • NRC study done 7-8 years ago about the loss of data and samples in the geosciences: http://www.nap.edu/openbook.php?record_id=10348&page=R1 • Geoscience Data and Collections: NATIONAL RESOURCES IN PERIL

  17. Additional Comments • Marine Metadata Interoperability (MMI)http://marinemetadata.org/Collection of ‘Guides’ on topics including Semantic Web technologies, controlled vocabularies, ontologies, standards, metadata best practices, and much more. • MMI Ontology Registry and Repository (ORR) is a web application through which you can create, update, access, and map ontologies and their terms. http://mmisw.org/orr/#b

  18. Additional • CUASHI: Hydrologic Ontology System (funded by NSF) http://his.cuahsi.org/ontologyfiles.html http://water.sdsc.edu/hiscentral/startree.aspx • "Data Management Plan" template available from CUAHSI (February 2011). It is available at http://www.cuahsi.org/his-dmp.html; and includes data inventory, data and metadata standards, data management life cycle, etc.

  19. Additional Comments • EXILIR http://www.bbsrc.ac.uk/science/international/elixir.aspx European life science infrastructure for biological information. • Its Mission: To construct and operate a sustainable infrastructure for biological information in Europe to support life science research and its translation to medicine and the environment, the bio-industries and society.

More Related