1 / 34

IndoFlux: Data Archiving, Standardization, and Quality

This workshop aims to establish a long-term biogeochemical monitoring network in India to assess the impact of global environmental change. It will provide practical guidance on archiving data and produce guidelines for a proposed Center for Global Environmental Change.

Download Presentation

IndoFlux: Data Archiving, Standardization, and Quality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IndoFlux: Support Areas: Focus Area V: Data archiving, Analyses: Standardization and Quality A Reference Model and Practical Guidance for Design of IndoFlux Data Archiving Dr. Matthew K. Howard Department of Oceanography Texas A&M University College Station, TX, USA (mkhoward@tamu.edu) IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  2. Outline • Introduction • High-level Guidance on Digital Preservation • Practical Guidance on Archiving • Recommendations • Links IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  3. Introduction • This is a bilateral workshop to help establish a long-term biogeochemical monitoring network in India to assess the impact of global environmental change. • This workshop will • decide monitoring locations and instrumentation, • create an oversight committee, • create a scientific plan, • create a strategic vision, • identify near-term plan for sustained bilateral interactions, • produce guidelines for a proposed Center for Global Environmental Change which will implement and manage IndoFlux. IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  4. Changes I’ve Seen • Trend towards multidisciplinary studies • Trend towards multiple-investigator studies • Larger data sets • Nested evolutions (local, …, global) • Expertise requirements exceeds graduate students and post-doctoral capabilities • Require degreed pier-level Computer Scientists on your team who work “with” you not “for” you. • Complex organizational and funding landscapes IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  5. Matthew Howard • Research Scientist (Physical Oceanographer) at Texas A&M University • Data Manager for many MMS Coastal Monitoring Programs • Data Manager for NOAA Mechanisms that Control Hypoxia Program • Texas Automated Buoy System (TABS) and Nowcast/Forecast System • Co-PI Antigua-Barbuda Coastal Marine Ecosystem Management System • Co-PI NFS Marine Metadata Initiative (MMI) • IOOS Data Management and Communications Committee (DMAC) • Executive Team Member • Steering Team: National Federation of Regional Associations Representative • Modelers Expert Team Member • Chair of the Regional Association Caucus • Gulf of Mexico Coastal Ocean Observing System Regional Association (GCOOS-RA) DMAC Technical Lead • NFS Ocean Observatories Initiative (OOI) Ocean Research Interactive Observatory Networks (ORION) CyberInfrastructure Team IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  6. Views I’ve had of an Archive • The end of a data collection effort. (Provider) • Collect, QA/QC, analyze, report, archive • The beginning of a synthesis study. (User) • Discover, search, extract, reformat, merge, improve • Discover, search, extract, reformat, merge, improve • Discover, search, extract, reformat, merge, improve • Etc. for each archive… • But really, ….(next slide) IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  7. An Archive is a Trusted System comprised of an organization, people, hardware and a plan, dedicated to the complete life-cycle of data for a designated community. IndoFlux Data Gurus at Work IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  8. LATEX & NEGOM Stations Data Provider LATEX 1992-1994 1650 CTD Profiles water samples ~45 current meters 8 meteorological buoys NEGOM 1997-1999 892 CTD Profiles + water samples 890 XBT stations IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  9. Synthesis/Reanalysis Data User CTD/NASEN Current Meters River Discharge XBT IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  10. Local Observing Systems Operational (24/7) Nowcast/Forecast System 1998-present Texas Automated Buoy System 1995-present IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  11. Regional Observing Systems Gulf of Mexico Coastal Ocean Observing System Regional Association Plus ~ 30-40 full water column Industry ADCP not shown Various Organizations in Mexico have been invited to join GCOOS IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  12. National Federation of Regional Associations IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  13. Operations Centers Local, Regional, National Concepts IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  14. Global Systems IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  15. Your Data Volumes are Growing NOAA Archive Volume Projections 2004-2020 Quarterly Data Downloads at NGDC 1993-2006 UCAR Data Migrations In TB 1986-2003 IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  16. High-level Guidance on Digital Preservation IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  17. Archive Guidance Documents Generic Framework for Archiving Data tied to standards 2002 NOAA Preliminary Recommendations For Archiving Environmental and Geospatial Data 2006 Roles and Responsibilities of Organizations and People 2002 IOOS DMAC Plan 2004 w annual updates Machine to machine interoperable systems IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  18. Trusted Digital Repository • A trusted digital repository is one whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now and in the future. • 2002 Document by Research Libraries Group - evolved with the OAIS report IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  19. Seven Attributes of a TDR • OAIS compliance • Administrative Responsibility • Organizational Viability • Financial Sustainability • Technological and Procedural Suitability • System Security • Procedural Accountability IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  20. OAIS • “Reference Model for an Open Archival Information System (OAIS)" has gained widespread acceptance from the environmental data community. • OAIS the result of a request by the International Organization for Standardization (ISO) to the Consultative Committee for Space Data Systems (CCSDS) to develop standards in support of the long term preservation of digital information obtained from observations of the terrestrial and space environments. (BUT IT APPLIES TO ALL ARCHIVES - digital, physical, etc) IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  21. Reference Model for an Open Archival Information System (OAIS) • Provides a framework for the understanding archival concepts needed for long term digital information preservation and access. • Provides the concepts needed by non-archival organizations to be effective participants in the preservation process. • Provides a framework, including terminology and concepts, for describing and comparing architectures and operations of existing and future archives. • Provides a basis for comparing the data models of digital information preserved by archives and for discussing how data models and the underlying information may change over times. • Provides a foundation that may be expanded by other efforts to cover long-term preservation of information that is not in digital form. • Expands consensus on the elements and processes for long-term digital information preservation and access. IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  22. OAIS cont. • Doesn’t tell you how to build anything, it just describes the functionality you need. • Most existing groups already have elements of OAIS - your group need only map your practice onto OAIS and look for gaps. • OAIS implementations are being developed - Google OAIS IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  23. Preliminary Principles and Guidelines for Archiving Environmental and Geospatial Data at NOAA:Interim ReportCommittee on Archiving and Accessing Environmental and Geospatial Data at NOAABoard on Atmospheric Sciences and ClimateDivision on Earth and Life Studies (2006) • A National Academies Report • Funding for collection programs should include data management activities • High-quality well-documented data should be archived in most primitive useful form. • Solicit broad user community input • Integrate data from distributed archives to a single access point. (One stop shopping). • Scientific data stewardship applied. IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  24. DMAC Plan • Draft May 2004 • Extensive Review • Released March 2005 • Concrete Guide to Data Providers • Updated Annually • http://ocean.us • 318 pages( at least read the Executive Summary - only 8.5 p) Goal: Automated and Largely Unattended Data Systems IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  25. Practical Guidance • Steve Worley: Scientific Computing Section of the U.S. National Center for Atmospheric Research (NCAR) • Robert Keely: Canadian Marine Environmental Data Services (MEDS) Office • Cyndy Chandler: Ocean Carbon and Biogeochemistry Data Management Office at the Woods Hole Oceanographic Institution (WHOI). IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  26. Ingest • Interact early and often with the provider. • Negotiate archiving agreements that specify formats and metadata - tie to funding • Verify your ability to read data submission and understand metadata immediately after receipt. • Avoid proprietary formats • Push out data for which you have no expertise (orphan data sets just get lost). • Observed data is gold - keep it forever. IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  27. People • Domain specialists are required • Computer scientists are required • Turn over is a problem - people carry the corporate culture in their heads. • Staffing levels • NODC - 60, How partitioned, unknown • UCAR - 9 domain, 20 CS-types, 200 TB data, 600 entries • MEDS: 18 people mixed specialties - mostly domain • WHOI: 1 full-time data manager, 1-2m/yr domain IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  28. Metadata • Collect it all, organize it as best you can. • Non-Uniform vocabularies made automated search hit-or-miss. • Uniform vocabularies crucial for automated systems • “Those in the best position to create metadata have the least need for it.” • Data without metadata is worthless IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  29. Technology Traps • Software changes (OS, applications, formats) • Media hardware dies faster than media • Media needs continual refreshing and migration. • Databases: software changes, be sure to save an ASCII copy somewhere. • Maintain offsite storage in different management structure. IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  30. Other Initiatives to Follow • GEON The Geosciences Network (GEON) is developing cyberinfrastructure for integrative research to enable transformative advances in Geoscience research and education. • The National Ecological Observatory Network (NEON) will be the first national ecological measurement and observation system designed both to answer regional- to continental-scale scientific questions and to have the interdisciplinary participation necessary to achieve credible ecological forecasting and prediction. • The Ocean Research Interactive Observatory Networks (ORION) is a program that focuses the science, technology, education and outreach of an emerging network of science driven ocean observing systems. Part of the Ocean Observatories Initiative (OOI) designed to make sustained, long-term and adaptive measurements in the ocean. IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  31. Training in Ocean Observing Systems • TAMU: Certificate in Ocean Observing. (Sudeshna Lahiry first to receive in December 2006) • Rutgers: Masters in Operational Oceanography • U Southern Florida: courses in ocean observing system. • UCSD: GEON Cyberinfrastructure Summer Institute for Geoscientists IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  32. Recommendations • Make Data Management a Priority (20%) • Proper staffing levels (Domain specialists & CS types) • Budget for end-to-end data management • Include funds for professional development • Tie Archiving Agreements to funding • Formats, metadata, controlled vocabularies • Explore the Digital Preservation Website • http://www.library.cornell.edu/iris/tutorial/dpm/eng_index.html • Staff, costs, equipment, organizational readiness • TDR, OAIS, and much much more. • Pay attention to Open GeoSpatial Consortium (OCG) developments in area of Sensor ML and Sensor Web Enablement. IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  33. LINKS Digital Preservation Management Tutorial http://www.library.cornell.edu/iris/tutorial/dpm/eng_index.html Trusted Digital Repositories: Attributes and Responsibilities: (TDR) (Organizational) http://www.rlg.org/legacy/longterm/repositories.pdf Reference Model for an Open Archival Information System (OAIS) (Technological) http://public.ccsds.org/publications/archive/650x0b1.pdf Data Management and Communications Plan (Observing System Design) http://ocean.us./dmac IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

  34. The Story of Data Soup Once upon a time, somewhere in the post-dotcom bust in a distant land, there was a great famine in which people jealously hoarded whatever data they could find, hiding it even from their friends and neighbors. One day a wandering data manager came into a village and began asking questions as if he planned to stay for the night."There's not a byte to download in the whole province," he was told. "Better keep moving on.” “Oh, I have all the data I need," he said. "In fact, I was thinking of making some products to share with all of you." He pulled a laptop from his wagon, opened it up, and turned it on. Then, with great ceremony, he drew an ordinary-looking USB stick from a velvet bag and plugged it into the port on the side. By now, hearing the rumor of data and products, most of the villagers had come to the square or watched from their windows. As the manager typed and licked his lips in anticipation, hunger began to overcome their skepticism."Ah," the manager said to himself rather loudly, "I do have a beautiful bathymetric data sets. Of course, Data Soup with current vectors -- that's hard to beat!” Soon a villager approached hesitantly, holding some current vectors he'd retrieved from its hiding place, and added it to the laptop. "Capital!" cried the manager. "You know, I once had data soup with current vectors AND wind fields as well, and it was fit for a king."The village meteorologist managed to find some wind fields . . . and so it went, through SST’s, moored data, biogeochemical variables, wave heights, and so on, until there was indeed a delicious pool of data for all. The villagers offered the manager a great deal of money for the magic USB stick, but he refused to sell and traveled on the next day. The moral is that by working together, with everyone contributing what they can, a greater good is achieved. IndoFlux: Indo-US Bilateral Workshop, Hotel Green Park, Chennai

More Related