scd research data
Download
Skip this Video
Download Presentation
SCD Research Data

Loading in 2 Seconds...

play fullscreen
1 / 27

SCD Research Data - PowerPoint PPT Presentation


  • 269 Views
  • Uploaded on

SCD Research Data. For UCAR Data Management Working Group January 10, 2001 Steven Worley Scientific Computing Division Data Support Section. Four Categories of Data Service User Profile Data Content Data Access. Four Categories of Data Service. Archives directly from the MSS

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SCD Research Data' - Jims


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
scd research data
SCD Research Data

For

UCAR Data Management Working Group

January 10, 2001

Steven Worley

Scientific Computing Division

Data Support Section

slide2
Four Categories of Data Service
  • User Profile
  • Data Content
  • Data Access
four categories of data service
Four Categories of Data Service
  • Archives directly from the MSS
    • Accessible to all with NCAR computing accounts
  • Web accessible online data server
    • Information interface for all data
  • Individual requests
    • Customized on per request basis
  • Data preparation for large projects
    • E.g. Reanalyses at ECMWF and NCEP
user profile online data server
User profile, online data server
  • Users based on network address domain, data for 1995-1998
  • ~ 20K unique addresses per year
user profile individual request
User profile, individual request
  • Requests excluding CD-ROMS
  • Based on 1998-1999 data
    • 28% U.S. Univ. (179 of 638)
    • 11% Foreign Univ. (69)
    • 27% Foreign Non-Univ. (171)
    • 34% U.S. Gov. and Commercial (219)

(remarkably, some foreign and government sources find it desirable to acquire their own data from SCD/DSS)

user profile all users
User profile, all users

All users by year, excluding online category

user profile finding the data
User profile, finding the data
  • Peer and colleague recommendations
  • Acknowledgements in publications
  • WWW searches and perusing
quick look at dss information interface
Quick look at DSS Information Interface
  • Website, dss.ucar.edu
  • Top level information and dataset groupings
    • Oceanographic datasets by Category
important improvements for the information interface
Important improvements for the Information Interface
  • More top level documents to guide users to the “best” datasets
  • For improved searches
    • Carefully worded .html ..
    • Pages with introductory text that clearly defines the dataset with keywords that promote discovery.
    • .html .., note, not all search engines boost ranking based on these.
user profile compliments
User profile, compliments
  • Fast service, requests receive prompt action.
  • Staff with scientific knowledge to offer assistance and guidance.
  • Flexible system – can adapt to meet users requirements.
what makes this system work
What makes this system work
  • The data records and files remain in simple structures
      • This way the archive should always be accessible to programs written with low level languages
      • The data can survive evolutions in OS systems and software, 50-years is not too much.
      • Programs can be written that allow fast and efficient manipulation of large collections.
      • Internal checksum keys can be strategically placed to insure data integrity – at any level.
user profile complaints
User profile, complaints
  • All the data is not online – even though this quite impractical – 12+ TB
  • All the data is not in their favorite format, IDL, HDF, netCDF, GrIB, ASCII, GIS, Binary, .xls , Matlab, etc.
  • “Can I just get the piece I need?”
  • “Do you mean I need to know some FORTRAN or C Language?”
user profile skill set
User Profile, skill set
  • Best skill set for our users includes knowing some FORTRAN and/or C.
  • Trend; more and more people are requesting data in application environment specific formats
    • Will the next generation scientist know a basic computing language?
data content size and characteristic
Data Content, size and characteristic
  • Veritable smorgasbord of data.
    • Overall size, 12+ TB
    • 500+ distinct datasets
      • Many historical observations from the atmosphere, and ocean
      • Many operational analyses and reanalyses
    • Dataset sizes, < 1 MB to several TB
    • Many original formats. GrIB is dominate in our analyses and reanalyses datasets
data content metadata management
Data Content, metadata management
  • Primarily, metadata is managed on our online information server. Each dataset has a WWW page.
  • All dataset WWW pages are automatically formed.
    • Corrections, addition, and changes are made to text files manipulated under a Unix change and control system.
    • Advantage: history of all changes and data files associated with the dataset, and the WWW pages are always current.
data content metadata management17
Data Content, metadata management
  • Have considerable amounts of hard copy references and metadata.

- We are making scanned images of these now.

data content long term archive and security
Data Content, long term archive and security
  • Small datasets and irreplaceable observations and analyses have two copies on the MSS
    • Although we cannot guarantee they reside on separate cartridges
  • Files are write password protected – prevents accidental overwrites.
  • We have been fortunate to have a very reliable MSS and our success will continue to rely on it in the future.
data content long term archive and security19
Data Content, long term archive and security
  • Areas of concern
    • We don’t have adequate offsite backups
      • At least critical observations should be protected from catastrophe at the Mesa Lab
    • In the event of loss of single copy large datasets we rely on other centers for replacement
      • This needs to be discussed more nationally
      • Redistribution may have restrictions or be costly
data content long term archive and security20
Data Content, long term archive and security
  • Areas of concern, continued
    • Must always remain on guard so important data are not lost due to short sighted policy decisions.
    • Must participate in national and international projects so that the archive content is continually refreshed with the most scientifically important data, at low cost.
data access aids to access
Data Access, aids to access
  • Maintain FORTRAN code to read all data files
    • Sometimes for many platforms (Unix, PC)
  • The MSS file location is defined for all datasets, and is available online.
  • Staff specialist are assigned and identified for each dataset
data access most frequent
Data Access, most frequent
  • NCEP/NCAR Global Atmospheric Reanalysis, 2.6 TB
  • How?
    • MSS
    • WWW (monthly means)
    • CDROMS
    • FTP
    • Various Tape Media (large capacity)
data access largest barrier
Data Access, largest barrier
  • Discovering what is available
  • Gaining access to the MSS collection

(when they don’t have a computing account)

  • Not having experience with low level languages, e.g. FORTRAN and C/C++
data access product development
Data Access, product development
  • Yes we do, and we feel it is very important!
  • Why?
    • Can QC the data and identify problems early
    • Can reorganize into logical collection, or create popular subsets.
    • Reduce the volume of large collections to manageable size for users
    • Saves many users extra work
data access improvements for scientific advancement
Data Access, improvements for scientific advancement
  • Minimize the barriers that inhibit discovery – metadata problem.
  • Supply the data in the users favorite format or provide tools that can convert the data where it is practical and efficient.
  • Place more data, and valuable higher level data products on line
ad