1 / 22

Delivering Point Observation Data XML vs. NetCDF

Delivering Point Observation Data XML vs. NetCDF. John Caron, Ethan Davis UCAR/Unidata Jeremy Tandy, Bruce Wright British Met Office. Motivation . Can OGC Web Coverage Service (WCS) be used to deliver “Point Obs” Data ?

gretel
Download Presentation

Delivering Point Observation Data XML vs. NetCDF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Delivering Point Observation DataXML vs. NetCDF John Caron, Ethan Davis UCAR/Unidata Jeremy Tandy, Bruce Wright British Met Office

  2. Motivation • Can OGC Web Coverage Service (WCS) be used to deliver “Point Obs” Data ? • Part of a theoretical effort to marry Feature and Coverage conceptual models • What are the tradeoffs of delivering the data in text (XML) vs binary (netCDF) ?

  3. Testing Methodology • Prototype using TDS server : proxy for WCS • Simple “ad-hoc” XML : proxy for GML • NetCDF with proposed CF Conventions • Try some common queries, measure time to fetch and read the results • Details will be in paper

  4. Test dataset : Metars • Text data coming from Unidata Internet Data Distribution (realtime) • 5000 stations around the world • 120K metars/day • 7 day rolling archive at Unidata • Stored as netCDF-3 file with “Unidata point observation” convention

  5. TDS: NetCDF Subset Servicefor Point Data • Experimental “REST” Web service in the THREDDS Data Server (TDS) • Allows subsetting by variable, lat/lon bounding box, time range • Output formats: NetCDF, CSV, XML, Raw

  6. Sample Queries • Query A: complete time series at one station • One station = LOWW (Vienna, Austria) • 353 records • Query B: all stations in a Bounding Box for one time point • Bounding box: lat=[35,90] lon=[-10,50] (Europe) • 662 records • Query C: complete 1 day all stations • 4818 stations • 114,370 records

  7. Raw Metar 2007-11-24T13:49:52Z= LOWW 241350Z 33009KT 300V360 9999 SCT014 BKN016 07/04 Q1022 NOSIG 2007-11-24T14:19:44Z= LOWW 241420Z 33008KT 9999 FEW011 BKN013 07/04 Q1022 NOSIG 2007-11-24T09:50:56Z= LOWW 240950Z 31009KT 9999 BKN014 07/03 Q1022 BECMG BKN020

  8. XML <metar date="2007-11-24T14:19:44Z"> <station name="LOWW" latitude="48.119" longitude="16.569" altitude="190">VIENNA/SCHWECHA, -- OS</station> <data name="air_pressure_at_sea_level" units="hectoPascal">-99999.0 </data> <data name="air_temperature" units="Celsius">7.0 </data> <data name="dew_point_temperature" units="Celsius">4.0 </data> <data name="hectoPascal_ALTIM" units="hectoPascal">1022.0 </data> <data name="precipitation_amount_hourly" units=".01 inches">-99999.0 </data> <data name="visibility_in_air" units="US_statute_mile">-99999.0 </data> <data name="weather"></data> <data name="wind_from_direction" units="degrees">330 </data> <data name="wind_peak_speed" units="m/s">-99999.0 </data> <data name="wind_speed" units="m/s">4.115552 </data> </metar>

  9. NetCDF @fxŽÇÃO€ÇÃO€@ ??@???D~@?ÇÃO€????????????????ÇÃO€GGn?ÿÿÿÿ???@$ŸAÇÃO€ÇÃO€@€??@???D~@?ÇÃO€????????????????ÇÃO€GGg?ÿÿÿÿ???@E‹çÇÃO€ÇÃO€@€??@???D~@?ÇÃO€????????????????ÇÃO€GJ?ÿÿÿÿ??,AdÇÃO€ÇÃO€@À??€???D}@?ÇÃO€????????????????ÇÃO€GIü?ÿÿÿÿ??"A$ŸAÇÃO€ÇÃO€@À??¿€??D}@?ÇÃO€????????????????ÇÃO€GIõ?ÿÿÿÿ??"@öîáÇÃO€ÇÃO€@À??€???D}€?ÇÃO€????????????????ÇÃO€GIî?ÿÿÿÿ??"@æxŽÇÃO€ÇÃO€@À??€???D}€?ÇÃO€????????????????ÇÃO€GIç?ÿÿÿÿ??"A(íÇÃO€ÇÃO€@À??€???D}@?ÇÃO€????????????????ÇÃO€GIà?ÿÿÿÿ??,@æxŽÇÃO€ÇÃO€@ ??????D}€?ÇÃO€????????????????ÇÃO€GIÙ?ÿÿÿÿ??"@æxŽÇÃO€ÇÃO€@À??????D}€?ÇÃO€????????????????ÇÃO€GIÒ?ÿÿÿÿ??"AíÄÇÃO€ÇÃO€@À???€??D}À?ÇÃO€????????????????ÇÃO€GIË?ÿÿÿÿ??"A(íÇÃO€ÇÃO€@À???€??D}€?ÇÃO€????????????????ÇÃO€GIÄ?ÿÿÿÿ??,A(íÇÃO€ÇÃO€@À??????D}€?ÇÃO€????????????????ÇÃO€GI½?ÿÿÿÿ??"@Ö:ÇÃO€ÇÃO€@À??????D}À?ÇÃO€????????????????ÇÃO€GI¶?ÿÿÿÿ??,@æxŽÇÃO€ÇÃO€@À???€??D}À?ÇÃO€????????????????ÇÃO€GI¯?ÿÿÿÿ??6@ƒ²šÇÃO€ÇÃO€@ ???€??D~??ÇÃO€????????????????ÇÃO€GI§€ÿÿÿÿ??,AíÄÇÃO€ÇÃO€@À???€??D~??ÇÃO€????????????????ÇÃO€GI €ÿÿÿÿ??"A²šÇÃO€ÇÃO€@À???€??D~??ÇÃO€????????????????ÇÃO€GI™€ÿÿÿÿ??"A(íÇÃO€ÇÃO€@à???€??D}À?ÇÃO€????????????????ÇÃO€GI’€ÿÿÿÿ??"@Ö:ÇÃO€ÇÃO€@à???€??D}À?ÇÃO€????????????????ÇÃO€GI‹€ÿÿÿÿ??"@µ”ÇÃO€ÇÃO€@à???€??D~??ÇÃO€????????????????ÇÃO€GI„€ÿÿÿÿ??,@Å‹çÇÃO€ÇÃO€@à??@???D~??ÇÃO€????????????????ÇÃO€GI}€ÿÿÿÿ??,A(íÇÃO€ÇÃO€A????€??D~??ÇÃO€????????????????ÇÃO€GIv€ÿÿÿÿ??"A²šÇÃO€ÇÃO€A????€??D~??ÇÃO€????????????????ÇÃO€GIo€ÿÿÿÿ??,AíÄÇÃO€ÇÃO€A???@???D~??ÇÃO€????????????????ÇÃO€GIh€ÿÿÿÿ??,AdÇÃO€ÇÃO€A????€??D~??ÇÃO€????????????????ÇÃO€GIa€ÿÿÿÿ??6@µ”ÇÃO€ÇÃO€A???@@??D~@?ÇÃO€????????????????ÇÃO€GIZ€ÿÿÿÿ??,@”(íÇÃO€ÇÃO€@à??@@??D~@?ÇÃO€????????????????ÇÃO€GIS€ÿÿÿÿ??,@”(íÇÃO€ÇÃO€@à??@@??D~@?ÇÃO€-SHRA ??????????ÇÃO€GIL€ÿÿÿÿ??@Å‹çÇÃO€ÇÃO€@à??@@??D~@?ÇÃO€????????????????ÇÃO€GIE€ÿÿÿÿ??,@öîáÇÃO€ÇÃO€@à??@@??D~@?ÇÃO€????????????????ÇÃO€GI>?ÿÿÿÿ??,@Å‹çÇÃO€ÇÃO€@à??@@??D~€?ÇÃO€-SHRA ??????????ÇÃO€GI7?ÿÿÿÿ??"@æxŽÇÃO€ÇÃO€@à??@@??D~@?ÇÃO€-SHRA ??????????ÇÃO€GI0?ÿÿÿÿ??"A²šÇÃO€ÇÃO€@à??@???D~@?ÇÃO€-SHRA ??????????ÇÃO€GI)?ÿÿÿÿ??,@”(íÇÃO€ÇÃO€@à??@@??D~@?ÇÃO€????????????????ÇÃO€GI"?ÿÿÿÿ??,@”(íÇÃO€ÇÃO€@à??@@??D~€?ÇÃO€????????????????ÇÃO€GI?ÿÿÿÿ??,@fxŽÇÃO€ÇÃO€@À??@@??D~€?ÇÃO€????????????????ÇÃO€GI?ÿÿÿÿ??@@E‹çÇÃO€ÇÃO€@À??@€??D~@?ÇÃO€????????????????ÇÃO€GI

  10. Fetch Time (secs)

  11. Fetch and Read (Query C)

  12. Use STaX, not DOM!

  13. File Size (Mb)

  14. Slow network (~100 kb/sec)

  15. XML vs NetCDF File Size

  16. Slow network - GZIP

  17. Streaming point obs data • XML can be written as a stream, as each observation is found • Harder for netCDF, which assumes random access capability • netCDF/CF* format can put station info at the front of the file, then the observations are appended along the unlimited dimension

  18. netCDF stream format • Only problem is the numrec = “number of unlimited records” field in the header • But numrec can be calculated by netcdf library based on the size. Netcdf-Java and Netcdf-C (>3.6) will do this – but not backwards compatible! • We have added this to the netCDF spec

  19. Slow – GZIP“netCDF stream format”

  20. Conclusions • XML reasonable for streaming data • Must use STaX, not DOM to parse • Should compress the content to overcome finite bandwidth • netCDF excels as archive format, multiple reads • “classic format” a bit slower than XML • “streaming format” : 25% speedup over XML, but not backwards compatible • netCDF-3 as binary “exchange format” for geospatial data

  21. Contact caron@unidata.ucar.edu Will have link to paper: http://unidata.ucar.edu/staff/caron/

More Related