1 / 28

The Live Access Server (Access to observational data)

Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin O’Brien, Ansley Manke, Steve Du, Xiaoping Wang, Joe Mclean, Joe Sirott, Jerry Davison. The Live Access Server (Access to observational data). Gridded vs. Observational Data. Clean Organized

zorana
Download Presentation

The Live Access Server (Access to observational data)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin O’Brien, Ansley Manke, Steve Du, Xiaoping Wang, Joe Mclean, Joe Sirott, Jerry Davison The Live Access Server(Access to observational data)

  2. Gridded vs. Observational Data • Clean • Organized • Labeled • Voluminous • Handled by machines • Dirty • Messy • Often un/mis-labeled • Increasingly voluminous • Previously handled by hand

  3. Live Access Server (LAS) • Web based, common interface to diverse sources of climate data • Single interface for subsetting, download, visualization, comparison • Easy access to metadata and documentation • Unified access to distributed data holdings • Uniform user interface to existing back end visualization packages

  4. Variable 4D Region‘Constraints’ LAS Data Model For data access users must specify: Dataset

  5. Dataset

  6. Dataset

  7. Variable

  8. 4D Region Constraints

  9. Output

  10. LAS Architecture LAS is three tiered

  11. Access to Remote Data Ferret back end is linked with OPeNDAP

  12. Java servlet redesign Data Server Details

  13. Server Side Functionality After parsing the user request LAS must: Access & Subset the data Perform analysis Create Visualization For interactive results each task should take <5 sec.

  14. Perform analysis Create Visualization The Hard Part After parsing the user request LAS must: Access & Subset the data

  15. Classes of Observational Climate Data Station time series (Eulerian) • Oceanic • tide guages (1D) • moored thermister chains (2D) • Atmospheric • surface weather stations (1D) • profilers (2D)

  16. Classes of Observational Climate Data Profile data • Oceanic • CTD casts, bottle data (ordered by cruise track, quasi-scattered) • repeat stations (ordered by cruise track or station location) • Atmospheric • profilers (station based) • baloons (2D, quasi-lagrangian)

  17. Classes of Observational Climate Data Tracks (Lagrangian) • Oceanic • ship underway data (surface) • drifting buoys (surface) • ARGO floats (surface tracks, scattered profiles) • instrumented animals (depth) • Atmospheric • airplane underway data (altitude) • baloons (altitude, quasi-stationary, quasi-profile)

  18. Classes of Observational Climate Data Random Scatter • Oceanic • surface ship observations • profile locations • Atmospheric • surface weather obs

  19. Example Dataset NOAA/NODC/OCL World Ocean Database 2001 • data collected from ocean cruises and moorings • scattered profiles, lagrangian drifters • physical, chemical and biological data • dozens (hundreds?) of variables • > 7 million profiles (1792-present, global) • > 10 Gigabytes of data (accelerating every year)

  20. Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Current access: • Choose either temporally or spatially sorted data • Choose year(s) or 10x10 degree box • Choose instrument • Retrieve data for all variables from that ‘file’ Problems: • Cannot subset data (1 year x 1 instrument ≈ 7 Mbytes) • Data returned in impenetrable compressed ASCII files • Associated metadata is lost

  21. Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Our attempt at synoptic/cross-instrument data access • Store data by variable • Plan for those getting data out, not putting data in. • What do scientific analysis and visualization packages need? • Store data for minimum # of disk seeks • Memory is fast (and cheap!), disk seeks are slow. • Multi-stage process for determining data blocks needed. • Read excess data into memory, then winnow.

  22. Latitude number of profiles pointer into NetCDF metadata file Time = Longitude Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Step 1: synoptic meta-pointer file (0.3 MByte) a) load synoptic meta-pointer file into memory b) subset to extract metadata pointers 10deg x 10deg x 50 irregular timesteps = 260 Kbytes

  23. Julian day Lat Lon Cruise ID # of levels Var_ptr Var_QC = N variables x Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Step 2: metadata/data-pointer file (200 Mbyte) a) read blocks of profile metadata into memory b) subset by X/Y/T to obtain valid data pointers T X Y

  24. x N depths Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Step 3: data files (10 - 2000 Mbyte) a) read profile data b) subset by depth/quality flag to obtain valid data 1D profile T X Depth Value Quality flag Y Z =

  25. Example Dataset NOAA/NODC/OCL World Ocean Database 2001 Our attempt at synoptic/cross-instrument data access Successes: • Able to subset without accessing (much) unwanted data • Access to (<1 Mbyte) subsets in seconds • Access to metadata (“What profiles exist?”) even faster Problems: • Only set up for most important variables • Data cannot be updated, must be rewritten • Must reinvent logic for relational queries • Funky, home built soluition

  26. Other data streams • METAR obs (station time series) • 1700 US weather stations report hourly data • 25 variables = 120 Mbytes/month • ARGO floats (profiles) • 4000 floats reporting profiles every 10 days • 50 levels x 10 variables = 24 Mbytes/month • Tagging Of Pacific Pelagics (TOPP) (lagrangian tracks) • 50 animals per year tagged with 1 min data recorders • 5 variables = 0.8 Mbytes/month • Voluntary Observing Ships (random scatter) • 3000 surface ship reports per day • 25 variables = 9 Mbytes/month

  27. Observational Data Access Requirements • Subset based on X, Y, Z, T or metadata (e.g. quality flag or station/ship/platform/animal_ID). • Only return requested data. (Reduced volume for remote data access.) • For near-real-time, daily updates are acceptable. (Can recreate static files on a daily basis if necessary.) • Use standards wherever possible. • Make the creation of the database as simple as possible. (Non-experts can follow cookbook examples.)

  28. Conclusion • Efficient access to observational data is an unsolved problem. • Data volumes are increasing exponentially. • Data access problems hinder the development of interactive visualization tools.

More Related