1 / 28

Developing a National Plan for Glider Operations: Data Management

Developing a National Plan for Glider Operations: Data Management. IOOS Glider Workshop August 2012 Jim Potemra, UH. Goal. Develop a national plan for glider operations. Part of this plan should address the management of glider data.

Download Presentation

Developing a National Plan for Glider Operations: Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing a National Plan for Glider Operations:Data Management IOOS Glider Workshop August 2012 Jim Potemra, UH

  2. Goal Develop a national plan for glider operations. Part of this plan should address the management of glider data. • Start discussion on developing a common plan for glider-to-archive data streams • Motivation is to promote data interoperability as well as making data easier to access and use

  3. National Plans Wave Plan: “…all data will flow through IOOS DAC operated by NDBC and CDIP using IOOS-DIF standards and metadata…” HFR Plan: “…data management principles…have been coordinated with IOOS DIF…”

  4. Components of DMP • Data formats and standards • Include vocabulary, conventions, metadata • Data services • How to get data to users • Implementation • How will this get done and by whom • Issues include incorporation of QA/QC

  5. 1. Data formats and standards • Assumption 1: Essentially three types of gliders, roughly measuring similar variables • Assumption 2: Raw formats may differ, but getting to an initial (near-real-time) netCDF file would be possible • Proposal: netCDF data model with CF conventions (standardized vocabulary and units)

  6. 1. Data formats and standards (cont’d) • Advantages: • Easy to implement (hopefully) • In-line with Argo, OceanSITES, IOOS • Large user community and tool set • Disadvantages: • Even within netCDF there are permutations • New featureType attribute might be slow to catch on

  7. 2. Data services • Assumption 1: there will not be a single solution • Assumption 2: the user community is well known • Assumption 3: interoperability and open access are goals • Proposal: distribute data through four main mechanisms: • Direct access: pilots and science PI’s will likely access data via ssh to server machine and/or NFS mounted disk • GTS: data to GTS either directly off modem or via DAC/WMO center • ftp: local (and maybe remote) research use may want data transfer • OPeNDAP: remote users, automatic harvesting done via OPeNDAP

  8. 2. Data services (cont’d) • Advantages: • Easy to implement (hopefully) • In-line with IOOS and Argo • Will cover almost all possible user requests • Disadvantages: • Versioning will be difficult • Several access logs • Maintenance of different servers

  9. 3. Implementation • Data archives • Distributed centers (data assembly centers) that provide equivalent services for seamless integration • Central assembly center where all data providers submit data (GDAC)  NODC/NDBC/NCDC? • Data files (QC issue…) • Single data stream with flags marking raw, real-time QC, delayed-mode data (e.g., Argo) • Two data streams (e.g., tide gauge)

  10. QA/QC considerations • Different layers to this: • Different data streams or over-write • Done by provider, aggregator or separate team • Either way, a documented plan would be helpful • QA/QC extends to data and dimensions (e.g., how accurate are time/location; is this important?) • Impacts data file w.r.t. vocabulary and flags (so not just an issue of what tests are run)

  11. 3. Implementation (cont’d) • Data governance • Data management team (real-time operators, delayed-mode QC) • Ad hoc, standards-based (articulate best practices and leave to providers) • Misc • Provide service to users? • Others?

  12. Suggested Approach User-driven: Issue one is to identify users • Scientific PI • Researcher (non-PI) • Operational modeling centers • Re-analysis modeling • Pilots • General (non-scientific) users

  13. Complete Picture Iridium modem GTS Operational Center ssh Iridium modem Shore Station GLIDER Pilot console ftp/cp Science PI ftp/ssh Data processing; conversion to netCDF; QA/QC applied ftp/cp Reanalysis modeling http ftp/cp Data Service (OPeNDAP) Researcher ? http Non-science user Archive Center

  14. Data format and transport by user

  15. Suggested Approach (cont’d) Data providers at other end: Issue one is to document existing practices • Inventory of gliders? • Three main types; data formats for these? • Role of manufacturer?

  16. Areas to consider: • Staring point is glider • What variables and formats need to be addressed? • All transmitting via Iridium? • Multiple ending points • Pilots, scientists (PI’s), scientists (research), modeling centers (real-time), model reanalysis studies (historical), other users (?) • Added aspect of regional and national viewers and/or aggregation centers • Implementation • Federated (all groups carry on), or centralize (e.g., Argo) with “data assembly centers” (DACs) • Maintenance of two data streams • Sea level (real-time and delayed mode) • Argo (combination)

  17. Based on this goal, discuss development of a standard format(s) and possible standard transport(s) mechanism • Depending on time and interest, discussion on data format could extend to terminology • Based on this goal, discuss an implementation plan • How to execute data plan, e.g., distributed system, federated system, DAC’s, maintenance of real-time and delayed mode, etc.

  18. Data Management Issues • If goal is discovery • Need a central catalog (service) • If goal is availability • Need to provide a service (ftp, OPeNDAP, etc.) • If goal is interoperability • Need to settle on common data model and/or service (netCDF with ftp) • All sorts of other stuff • Central vs. distributed archive

  19. IOOS model thus far • All data available asap • All data available via “standard” service • OPeNDAP/THREDDS • SOS, ftp • ERDDAP/vis tools • Data service more or less dictates format/model: • netCDF

  20. Data availability: IOOS RA IOOS has 11 Regional Associations. The availability of glider data via these RA’s are in three broad categories: • No obvious link to glider data or plots • AOOS (Alaska) • CaRA (Caribbean) • GLOS (Great Lakes) • GCOOS (Gulf Coast) • NERACOOS (Northeast Atlantic) • SECOORA (Southeast Atlantic) • Some data available via OPeNDAP, limited plots/maps • CenCOOS (Central California) single mission • PacIOOS • Data, maps and viewer • MARACOOS (Mid-Atlantic) Rutgers • NANOOS (Pacific Northwest) APL/UW • SCCOOS (Southern California) Scripps

  21. Data availability: NOAA/NODC • GTSPP: • http://www.nodc.noaa.gov/GTSPP/index.html • Data by name (e.g., pacific/2012/06): gtspp_14239088_te_111.ncgtspp_14239470_te_111.ncgtspp_14239585_te_111.ncgtspp_14239643_te_111.nc • Files have featureType: profile • Deep Water Horizon • http://www.nodc.noaa.gov/General/DeepwaterHorizon/glider_float.html#glider • Single lat/lon/time per profile: • Temp(time,depth,lat,lon) for a single time,lat,lon

  22. Data availability: NOAA/NDBC • Data list and pre-made profile plots

  23. Data availability: other • UW/APL • Scripps • Rutgers • C-MORE/HOT

More Related