1 / 24

Collaboration Tools and Techniques for Large Model Data Sets

Collaboration Tools and Techniques for Large Model Data Sets. Rich Signell,USGS Woods Hole, MA. Motivation. Typical model outputs are 100 Mb up to several GB. Traditional collaboration method: users grab the whole NetCDF file from your web/ftp site, or you e-mail them a few images.

hachi
Download Presentation

Collaboration Tools and Techniques for Large Model Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collaboration Tools and Techniques for Large Model Data Sets Rich Signell,USGS Woods Hole, MA

  2. Motivation • Typical model outputs are 100 Mb up to several GB. • Traditional collaboration method: users grab the whole NetCDF file from your web/ftp site, or you e-mail them a few images. • There is a better way…

  3. NetCDF • Machine independent, self-describing, binary format for multidimensional scientific data • Interfaces: Fortran, C, C++, Java, Perl, Matlab, IDL, Python • Free, supported by NSF at Unidata

  4. netcdf swan_short { dimensions: y = 376 ; x = 136 ; time = UNLIMITED ; // (82 currently) variables: float depth(y, x) ; depth:units = "m" ; depth:long_name = "water depth" ; depth:_FillValue = -99999.f ; depth:coordinates = "lon lat" ; short hsig(time, y, x) ; hsig:units = "m" ; hsig:long_name = "significant wave height" ; hsig:_FillValue = 32767s ; hsig:add_offset = 14.5f ; hsig:scale_factor = 0.00047304f ; hsig:coordinates = "lon lat" ; double time(time) ; time:units = "days since 1968-05-23" ; time:long_name = "modified julian day (ROMS-style)" ; float lon(y, x) ; lon:units = "degrees_east" ; lon:long_name = "longitude" ; float lat(y, x) ; lat:units = "degrees_north" ; lat:long_name = "latitude" ; // global attributes: :Conventions = "CF-1.0" ; :title = "SWAN driven by 7 km LAMI met model" ; :institution = "SACLANT Undersea Research Centre" ; :source = "SWAN Wave Model (NRL-SSC OpenMP version 31-Mar-2003)"; :contact = "Rich Signell (signell@saclantc.nato.int" }

  5. PROGRAM WRITE_NC c INCLUDE 'netcdf.inc' PARAMETER (TIMES=3, LATS=5, LONS=10) ! dimension lengths INTEGER STATUS, NCID, TIMES INTEGER RHID ! variable ID DOUBLE RHVALS(LONS, LATS, TIMES) ... NF_OPEN ('foo.nc', NF_WRITE, NCID) NF_INQ_VARID (NCID, 'rh', RHID) DO 10 ILON = 1, LONS DO 10 ILAT = 1, LATS DO 10 ITIME = 1, TIMES RHVALS(ILON, ILAT, ITIME) = 0.5 10 CONTINUE NF_PUT_VAR_DOUBLE (NCID, RHID, RHVALS)

  6. DODS/OpenDAPhttp://www.opendap.org • Open Data Access Protocol for delivery of multidimensional scientific data via http • DODS allows efficient slicing from data via the web, just as NetCDF works for local files. (Putting the “Net” in NetCDF!) • DODS serves not just NetCDF, but also Matlab, HDF (also GRIB, BUFR, etc…)

  7. Accessing DODS data • DODS APIs (C++, Java) • Any NetCDF code, relinked instead with DODS netCDF library • ncdump => dncdump • ncview => dncview • Your Fortran, C, C++, Python, Perl, Java code…

  8. DODS & Matlab • DODS GUI and command line tools • Relinked mexcdf53.dll, which can enable all Matlab tools that read NetCDF! • (e.g.) NetCDF/Matlab toolbox • >> url=‘http://long_path/myfile.nc’ • >> nc=netcdf(url); • >> lon=nc{‘lon’}(:); • Google on: “sourceforge” “mexcdf”

  9. DODS/OpenDAP • Serving DODS data requires almost no effort on the part of the data provider: • Download DODS server binaries to the cgi-bin directory on the web server • Put your NetCDF files on the web server • Go have a coffee to celebrate ! (Note: most people don’t know that getting a DODS server going is this easy!)

  10. DODS Success Story • DODS at sea: in limited bandwidth situation, grabbed only 200 k OBC region instead of 18 Mb NetCDF file. • 30 second download instead of 45 minutes!

  11. Need for Conventions • One of the greatest things about NetCDF is that it places few demands on the data provider - they are free to specify whatever attributes they want, or none at all • This is also one of the worst things, making it hard to develop flexible software • Software for ROMS won’t work for POM, NCOM, HOPS, ECOM, etc (and vice versa)

  12. CF Conventions I Google: “CF” “ucar”

  13. CF Conventions II

  14. Making ROMS CF-compliant • Store all information about the grid (lon_u, lat_u, angle) in the .his and .avg files (not just the grid file) • Add “coordinates” attributes to curvilinear variables (e.g. zeta:coordinates=“lat_rho lon_rho) • Add “standard_name=ocean_s_coordinate” • Make sure dimension names match coordinate variable names (ocean_time, sc_r) • Units need to be recognized by UDUNITS

  15. NCO I

  16. NCO II

  17. ROMS2CF script CF checker: http://titania.badc.rl.ac.uk/cgi-bin/cf-checker.pl #!/bin/bash GFILE='../adria02_grid2.nc' FFILE='adria03_avg.nc' ncks -F -d ocean_time,1 $FFILE ${FFILE}_CF # Specify horizontal coordinate variables associated with "RHO fields" ncatted -O -h -a "coordinates","temp",c,c,"lat_rho lon_rho" ${FFILE}_CF ncatted -O -h -a "coordinates","salt",c,c,"lat_rho lon_rho" ${FFILE}_CF # Specify horizontal coordinate variables associated with "U fields" ncatted -O -h -a "coordinates","u",c,c,"lat_u lon_u" ${FFILE}_CF ncatted -O -h -a "coordinates","ubar",c,c,"lat_u lon_u" ${FFILE}_CF # Merge the ROMS grid file into the CF file so we # have all the coordinate variables we need ncks -O -v lon_rho,lat_rho,lon_u,lat_u,lon_v,lat_v,mask_rho,mask_u,mask_v,angle $GFILE $GFILE.tmp ncks -A $GFILE.tmp ${FFILE}_CF rm $GFILE.tmp # Add vertical coordinate info ncatted -O -h -a "standard_name","sc_r",c,c,"ocean_s_coordinate" ${FFILE}_CF ncatted -O -h -a "positive","sc_r",c,c,"up" ${FFILE}_CF ncatted -O -h -a "formula_terms","sc_r",c,c,"s: sc_r eta: zeta depth: h a: theta_s b: theta_b depth_c: hc" ${FFILE}_CF # Add data from field file to template ncks -A $FFILE ${FFILE}_CF # rename the dimension ncrename -O -h -d s_rho,sc_r ${FFILE}_CF Google: “CF” “checker”

  18. Integrated Data Viewer (IDV) • Works on local CF-compliant NetCDF files • Works on THREDDS catalog data

  19. Integrated Data Viewer (IDV) • Works on local CF-compliant NetCDF files • Works on THREDDS catalog data

  20. IDV • Freeware supported by the Unidata Program Center (new app, version 1.2) • Java, utilizing Java3D and VisAD (VIS5D) • Runs on Windows, Mac, Solaris (VIS5D is limitation) • Reads NetCDF, DODS, ADDE, GeoTiff, Arc Shapefiles • Slices, dices, animates

  21. IDV in Action

  22. THREDDS

  23. Recommendations • Make your model output CF-compliant! • Distribute your model output via DODS • Make a THREDDS catalog for DODS data • Allow “packing” of data for efficient internet delivery (and disk utilization) • Develop software for CF-compliant data

  24. Abstract • Collaboration Tools and Techniques for Large Model Data Sets Rich SignellU.S. Geological SurveyWoods Hole, MA USA • New tools and standards are emerging that facilitate web-based collaboration with large data sets such as those produced by the ocean model ROMS. Using OpenDAP (a.k.a. DODS), ROMS NetCDF output files can be placed on a web server and users can extract just the data they need (say, the surface temperature from a particular day) from the file without any extra effort by the modeller. This, for example, allows a collaborator to issue a simple command in Matlab that will load just the model output desired from the remote web site into a local Matlab session, avoiding file format conversion and wasting network bandwidth. By linking with the OpenDap NetCDF library instead of the standard NetCDF library, any NetCDF application can be turned into a OpenDAP application. This approach was used to rebuild the popular Matlab/NetCDF interface “Mexcdf”, so if you get the OpenDAP-enabled version of this interface from the SourceForge MexCDF site, you can use any Matlab/netcdf application to access OpenDAP data as well. • If in addition the ROMS NetCDF files are modified to follow the CF Conventions, a set of conventions specifically designed for complex model output (including handling of the ROMS s-coordinate), then public domain software such as Unidata’s Integrated Data Viewer (IDV) will recognize the ROMS output files, and can be used to interactively browse, analyze and visualize the results in 3D. Multiple web users can visualize and manipulate the data interactively through the collaboration facility built into IDV. The conversion to CF-compliant NetCDF can be achieved easily using the NetCDF operator tools (NCO). The NCO tools can also be used to automatically reduce the ROMS output files by a factor of 2 by converting floats to short integers, which have sufficient dynamic range for most variables. This also doubles the speed at which Internet users can obtain their requested data. If the model data provider takes a small additional step of creating a THREDDS catalog (a straightforward XML file) of the CF compliant ROMS output files, then the model results appear as just another data source to an IDV user. This allows users to browse and create visualization using model results without knowing that they are using NetCDF.

More Related