1 / 22

Yellowstone and HPSS

Yellowstone and HPSS. OR What you’re doing on Bluefire that you should stop. David Hart CISL User Services Section August 7, 2012. Think different. Yellowstone is not Bluefire Yellowstone delivers 29x the computing capacity GLADE is not / ptmp New /glade/scratch (~5 PB) is

jaegar
Download Presentation

Yellowstone and HPSS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Yellowstone and HPSS OR What you’re doing on Bluefire that you should stop David HartCISL User Services SectionAugust 7, 2012

  2. Think different • Yellowstone is not Bluefire • Yellowstone delivers 29x the computing capacity • GLADE is not /ptmp • New /glade/scratch (~5 PB) is • 37x larger than /ptmp • 25x larger than old /glade/scratch • New GLADE is 7x larger, 15x faster than old GLADE • HPSS tape capacity is not infinite

  3. Tape ≠ slow disk • “Temporary” HPSS files use tape space that is not easily reclaimed • Deleting files from tapes leaves gaps that are not refilled (unlike disk) • “Repacking” not practical • Time consuming, may recover only 10% of tape space, occupies tape drives • Space is recovered only* when entire archive is migrated to new media • Wasted tape = smaller future HPC systems

  4. HPSS today

  5. HPSS, May 2012—15.75 PB Labels show: Entity, PB, % NCAR Labs 47% CSL 28%

  6. HPSS Growth May 2011-May 2012 • 12.8 PB (May 2011)  15.75 PB (May 2012) • +3 PB in one year, ~23% growth overall • ~70 TB added every week • Largest increases • CGD: 741 TB • CESM: 624 TB • RDA: 374 TB • University: 302 TB • RAL: 295 TB • NCAR Lab holdings grew 1.6 PB • Excluding CESM and other CSL activity • From 5.96 PB to 7.58 PB (+27%)

  7. Hit the wall running… HPSS by 2014

  8. Potential HPSS growth by Jan 2014 ~15 months!

  9. HPSS allocations • CISL’s new accounting system • Lets us set HPSS allocations • Helps you more easily monitor your holdings • We reduced allocations for CSL awardees and CHAP awardees • We will set a “budget” for NCAR labs, too

  10. HPSS holdings, Jan 2014 (projected) • 30+ PB data • 200M+ files

  11. Action items

  12. 1. Cleaning house • USERS: Opportune time to delete old files • CISL will eventually migrate current holdings to new media • Help us avoid migrating unnecessary data • Closing old projects will provide you with details about files associated with those projects • CISL: Convert dual-copy files to single-copy • Recovers ~3 PB of space, mostly older MSS files (where dual-copy was the default). • Since moving to HPSS, net amount of dual-copy data has decreased by 44 TB

  13. Limits to HPSS deletion HPSS holdings by year, in PB. Category amounts estimated. This represents upper limit of possible deletions, since some files may have been removed.

  14. 2. Manual second copies • Eliminating dual-copy class of service in favor of backup area for user-managed second copies • Currently the approach used by Research Data Archive (RDA) • Advantages: • Guarantees second copy is on different media • Reduces confusion on dual-copy limitations • Protects against user error (not true of 2-copy CoS!) • Removes, overwrites of original won’t clobber second copy • Changing your mind consumes less tape this way • And less cost!

  15. 3. Think “archive” • “A long-term storage area, often on magnetic tape, for backup copies of files or for files that are no longer in active use.” • American Heritage Dictionary • “Records or documents with historical value, or the place where such records and documents are kept.” • “To transfer files to slower, cheaper media (usually magnetic tape) to free the hard disk space they occupied. … [I]n the 1960s, when disk was much more expensive, files were often shuffled regularly between disk and tape.” • Free On-Line Dictionary of Computing

  16. Updated GLADE policies for Yellowstone • /glade/scratch (5 PB total) • 90-day file retention from last access • 10 TB quota default • If you need more, ask. • Use it! • Use responsibly! Don’t let large piles of data sit, untouched, for 88 days • We will rein in the 90 days, if needed. • /glade/work (1 PB total) • 500 GB quota default for everyone • No purging or scrubbing!

  17. Optimize your workflows • Don’t use tape for data/files you know are temporary or interim • Plan ahead • Leave temporary data in /glade/scratch • Post-process to final form beforearchiving • Take advantage of LSF-controlled Geyser and Caldera to automate post-processing tasks

  18. 4. Monitor off-site data • HPSS sizing and plans are estimated based on size of CISL’s production HPC resources • Not for NCAR data production on Hopper, Intrepid, Jaguar, Kraken, Pleiades, Blue Waters, Stampede … • HPC sites are pushing the data problem around • Projects, labs need to be aware of their users’ data migration to NCAR from off-site • Factor this into local data management plans

  19. 5. Plan ahead • CISL working with B&P on whether to formalize tape storage needs and costs associated with proposal activity • Most important for projects with plans to store “significant” amounts of data • How much is “significant” is TBD, but amounts that can be described in tenths of petabytes or more probably qualify. • If this applies to you, CISL can provide a cost for tape storage to include in your co-sponsorship budget.

  20. Looking ahead…1 year • GLADE will expand by ~5 PB in Q1 2014 • How would you like to take advantage of the new disk? • Near-term, online backup • E.g., an area for 6-month “insurance” copies • Longer scratch retention • Larger permanent “work” space • Other ideas? • HPSS procurement for next-generation archive in the planning stages

  21. Looking ahead…3-4 years • Recap: Yellowstone may lead to 25+ PBper year stored in HPSS • The successor to Yellowstone may be 10+ times more powerful • Anywhere from 15-40 Pflops, likely with GPU, MIC, or other many-core accelerators • Can we afford to maintain and manage 10x the HPSS storage? • 250 PB per year — 0.25 Exabyte per year?

  22. Questions?

More Related