1 / 9

RDA

RDA. Data Support Section. Topics. What is it? Who cares? Why does the RDA need CISL? What is on the horizon?. 1. What is it?. Research Data Archive (RDA) 600+ datasets that are significant to many NCAR and University scientists Archive work began over 40 years ago

pearly
Download Presentation

RDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RDA Data Support Section

  2. Topics • What is it? • Who cares? • Why does the RDA need CISL? • What is on the horizon?

  3. 1. What is it? • Research Data Archive (RDA) • 600+ datasets that are significant to many NCAR and University scientists • Archive work began over 40 years ago • Branded as RDA in 2003 • Generally, focused on atmospheric and oceanic environmental measurements or analyzed products derived from them • Critical data for weather and climate studies

  4. Who cares? Over 6000 Unique Users in 2008 • Growth in user access via the web, 2001 - 08 • Promoted with more online data and better interfaces • Consistent user access from the MSS • Represents provision to NCAR computers • 26-year record for filling one-off data requests • Decreasing as web increases in recent years

  5. Why does RDA need CISL? • Rely heavily on CISL infrastructure and experts: • Secure and reliable MSS/HPSS storage • Disk to support web services • Networks to bring data in and distribute out to users • Computing platforms to prepare and serve the RDA • DSS is Geo-science educated; need technical advise/support • Current metrics • Storage: • Primary – 400+ TB, 4+M files • All – 800+ TB (backup/working/etc) • Disk: 40TB on SAN • Servers and laptops • Servers (8) mix of SunOS & Linux • About 12 laptops/desktops • Data movement and growth

  6. Complete User Community • Advantages: • Fast access to online data – limited part of RDA • Access to all RDA content metadata • Access to RDA data processing services • HPC User Community • Advantages: • Access to full RDA • Fast computing • No login required • HPC User Community • Disadvantages: • No access to online data • Use MSS as a file server • No direct access to RDA metadata • No direct access to RDA data processing services • Require separate account to access RDA web server • Complete User Community • Disadvantages: • Slow access to MSS data – delayed mode • Have to create a separate RDA account and log in • Data processing requests take a long time to finish • Slow download speeds for some users

  7. Complete User Community • Improvements: • Fast access to full RDA • Expanded data processing services available • Single CISL account - no separate RDA account • Faster download speeds – grid-based tools, e.g. GRIDFTP • Single “first point of contact” for user support Resolved all the disadvantages • New Challenges: • GPFS and HPSS don’t have generic file use logging • Need for metrics & services • HPSS doesn’t have sophisticated file access control • Some RDA assets have limited access policies • Abandon a functional RDA registration system – retool a 20K+ user DB • Build command line tools to integrate RDA services into HPC environment • Of course, there will be more! • Big transition while maintaining normal RDA content growth and services • HPC User Community • Improvements: • Fast access to full RDA • Access to all RDA content metadata • Access to RDA data processing services • Single CISL account • Single “first point of contact”

  8. What is on the horizon? • Transition off all SunOS to Linux • Move SAN storage to GPFS GLADE • Put more data online in GLADE (O 130TB) • Fast access path internal and external • Transition ALL RDA from MSS to HPSS • Implement more on demand products • Data extraction and computing across TB datasets • Must be successful in GLADE, with HPSS, and using a scalable DA compute environment

  9. Questions • What is it? • Who cares? • Why does the RDA need CISL? • What is on the horizon?

More Related