Scd research data
1 / 27

SCD Research Data - PowerPoint PPT Presentation

  • Uploaded on

SCD Research Data. For UCAR Data Management Working Group January 10, 2001 Steven Worley Scientific Computing Division Data Support Section. Four Categories of Data Service User Profile Data Content Data Access. Four Categories of Data Service. Archives directly from the MSS

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'SCD Research Data' - Jims

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Scd research data l.jpg
SCD Research Data


UCAR Data Management Working Group

January 10, 2001

Steven Worley

Scientific Computing Division

Data Support Section

Slide2 l.jpg

Four categories of data service l.jpg
Four Categories of Data Service

  • Archives directly from the MSS

    • Accessible to all with NCAR computing accounts

  • Web accessible online data server

    • Information interface for all data

  • Individual requests

    • Customized on per request basis

  • Data preparation for large projects

    • E.g. Reanalyses at ECMWF and NCEP

User profile online data server l.jpg
User profile, online data server

  • Users based on network address domain, data for 1995-1998

  • ~ 20K unique addresses per year

User profile individual request l.jpg
User profile, individual request

  • Requests excluding CD-ROMS

  • Based on 1998-1999 data

    • 28% U.S. Univ. (179 of 638)

    • 11% Foreign Univ. (69)

    • 27% Foreign Non-Univ. (171)

    • 34% U.S. Gov. and Commercial (219)

      (remarkably, some foreign and government sources find it desirable to acquire their own data from SCD/DSS)

User profile all users l.jpg
User profile, all users

All users by year, excluding online category

User profile finding the data l.jpg
User profile, finding the data

  • Peer and colleague recommendations

  • Acknowledgements in publications

  • WWW searches and perusing

Quick look at dss information interface l.jpg
Quick look at DSS Information Interface

  • Website,

  • Top level information and dataset groupings

    • Oceanographic datasets by Category

Important improvements for the information interface l.jpg
Important improvements for the Information Interface

  • More top level documents to guide users to the “best” datasets

  • For improved searches

    • Carefully worded .html <title> .. </title>

    • Pages with introductory text that clearly defines the dataset with keywords that promote discovery.

    • .html <meta tag>..</meta tag>, note, not all search engines boost ranking based on these.

User profile compliments l.jpg
User profile, compliments

  • Fast service, requests receive prompt action.

  • Staff with scientific knowledge to offer assistance and guidance.

  • Flexible system – can adapt to meet users requirements.

What makes this system work l.jpg
What makes this system work

  • The data records and files remain in simple structures

    • This way the archive should always be accessible to programs written with low level languages

    • The data can survive evolutions in OS systems and software, 50-years is not too much.

    • Programs can be written that allow fast and efficient manipulation of large collections.

    • Internal checksum keys can be strategically placed to insure data integrity – at any level.

User profile complaints l.jpg
User profile, complaints

  • All the data is not online – even though this quite impractical – 12+ TB

  • All the data is not in their favorite format, IDL, HDF, netCDF, GrIB, ASCII, GIS, Binary, .xls , Matlab, etc.

  • “Can I just get the piece I need?”

  • “Do you mean I need to know some FORTRAN or C Language?”

User profile skill set l.jpg
User Profile, skill set

  • Best skill set for our users includes knowing some FORTRAN and/or C.

  • Trend; more and more people are requesting data in application environment specific formats

    • Will the next generation scientist know a basic computing language?

Data content size and characteristic l.jpg
Data Content, size and characteristic

  • Veritable smorgasbord of data.

    • Overall size, 12+ TB

    • 500+ distinct datasets

      • Many historical observations from the atmosphere, and ocean

      • Many operational analyses and reanalyses

    • Dataset sizes, < 1 MB to several TB

    • Many original formats. GrIB is dominate in our analyses and reanalyses datasets

Data content metadata management l.jpg
Data Content, metadata management

  • Primarily, metadata is managed on our online information server. Each dataset has a WWW page.

  • All dataset WWW pages are automatically formed.

    • Corrections, addition, and changes are made to text files manipulated under a Unix change and control system.

    • Advantage: history of all changes and data files associated with the dataset, and the WWW pages are always current.

Data content metadata management17 l.jpg
Data Content, metadata management

  • Have considerable amounts of hard copy references and metadata.

    - We are making scanned images of these now.

Data content long term archive and security l.jpg
Data Content, long term archive and security

  • Small datasets and irreplaceable observations and analyses have two copies on the MSS

    • Although we cannot guarantee they reside on separate cartridges

  • Files are write password protected – prevents accidental overwrites.

  • We have been fortunate to have a very reliable MSS and our success will continue to rely on it in the future.

Data content long term archive and security19 l.jpg
Data Content, long term archive and security

  • Areas of concern

    • We don’t have adequate offsite backups

      • At least critical observations should be protected from catastrophe at the Mesa Lab

    • In the event of loss of single copy large datasets we rely on other centers for replacement

      • This needs to be discussed more nationally

      • Redistribution may have restrictions or be costly

Data content long term archive and security20 l.jpg
Data Content, long term archive and security

  • Areas of concern, continued

    • Must always remain on guard so important data are not lost due to short sighted policy decisions.

    • Must participate in national and international projects so that the archive content is continually refreshed with the most scientifically important data, at low cost.

Data access aids to access l.jpg
Data Access, aids to access

  • Maintain FORTRAN code to read all data files

    • Sometimes for many platforms (Unix, PC)

  • The MSS file location is defined for all datasets, and is available online.

  • Staff specialist are assigned and identified for each dataset

Data access most frequent l.jpg
Data Access, most frequent

  • NCEP/NCAR Global Atmospheric Reanalysis, 2.6 TB

  • How?

    • MSS

    • WWW (monthly means)

    • CDROMS

    • FTP

    • Various Tape Media (large capacity)

Data access largest barrier l.jpg
Data Access, largest barrier

  • Discovering what is available

  • Gaining access to the MSS collection

    (when they don’t have a computing account)

  • Not having experience with low level languages, e.g. FORTRAN and C/C++

Data access product development l.jpg
Data Access, product development

  • Yes we do, and we feel it is very important!

  • Why?

    • Can QC the data and identify problems early

    • Can reorganize into logical collection, or create popular subsets.

    • Reduce the volume of large collections to manageable size for users

    • Saves many users extra work

Data access improvements for scientific advancement l.jpg
Data Access, improvements for scientific advancement

  • Minimize the barriers that inhibit discovery – metadata problem.

  • Supply the data in the users favorite format or provide tools that can convert the data where it is practical and efficient.

  • Place more data, and valuable higher level data products on line