Maximizing Data Efficiency: Solutions for Volume, Processing, Storage, Security, and Transfer
Explore how high data volumes impact processing, storage, security, and networks in computing facilities. Learn about efficient processing, secure storage, and fast data transfer methods.
Maximizing Data Efficiency: Solutions for Volume, Processing, Storage, Security, and Transfer
E N D
Presentation Transcript
Computing Facilities & Capabilities Julian Borrill Computational Research Division, Berkeley Lab & Space Sciences Laboratory, UC Berkeley
Computing Issues • Data Volume • Data Processing • Data Storage • Data Security • Data Transfer Its all about the data
Data Volume • Planck data volume drives (almost) everything • LFI : • 22 detectors with 32.5, 45 & 76.8 Hz sampling • 4 x 1010 samples per year • 0.2 TB time-ordered data + 1.0 TB full detector pointing data • HFI : • 52 detectors with 200 Hz sampling • 3 x 1011 samples per year • 1.3 TB time-ordered data + 0.2 TB full boresight pointing data • LevelS (e.g. CTP “Trieste” simulations) : • 4 LFI detectors with 32.5 Hz sampling • 4 x 109 samples per year • 2 scans x 2 beams x 2 samplings x 7 components + 2 noises • 1.0 TB time-ordered data + 0.2 TB full detector pointing data
Data Processing • Operation count scales linearly (& inefficiently) with # analyses, # realizations, # iterations, # samples : 100 x 100 x 100 x 100 x 1011 ~ O(10) Eflop • NERSC • Seaborg : 6080 CPU, 9 Tf/s • Jacquard : 712 CPU, 3 Tf/s (21 x Magique-II) • Bassi : 888 CPU, 7 Tf/s • NERSC-5 : O(100) Tf/s, first-byte in 2007 • O(2 x 106) CPU-hours/year => O(4) Eflop/yr (10GHz/5%) • USPDC cluster • Specification & location TBD, first-byte in 2007 • O(100) CPU x 7000 hours/year => O(0.4) Eflop/yr (5GHz/3%) • IPAC small cluster dedicated to ERCSC
Processing 9 Tf/s NERSC Seaborg 3 Tf/s NERSC Jacquard 7 Tf/s NERSC Bassi 0.1 Tf/s ERCSC Cluster 0.5 Tf/s USPDC Cluster 100 Tf/s NERSC 5
Data Storage • Archive at IPAC • mission data • O(10) TB • Long-term at NERSC using HPSS • mission + simulation data & derivatives • O(2) PB • Spinning disk at USPDC cluster & at NERSC using NGF • current active data subset • O(2 - 20) TB • Processor memory at USPDC cluster & at NERSC • running job(s) • O(1 - 10+) GB/CPU & O(0.1 - 10) TB total
Processing + Storage 9 Tf/s 6 TBNERSC Seaborg 2/20 PB NERSC HPSS 3 Tf/s 2 TBNERSC Jacquard 10 TB IPAC Archive 20/200 TB NERSC NGF 7 Tf/s 4 TB NERSC Bassi 0.1 Tf/s 50 GBERCSC Cluster 2 TB USPDC Cluster 0.5 Tf/s 200 GB USPDC Cluster 100 Tf/s 50 TB NERSC 5
Data Security • UNIX filegroups • special account : user planck • permissions _r__/___/___ • Personal keyfob to access planck acount • real-time grid-certification of individuals • fobs issued & managed by IPAC • single system for IPAC, NERSC & USPDC cluster • Allows securing of selected data • e.g. mission vs simulation • Differentiates access to facilities and to data • standard personal account & special planck account
Processing + Storage + Security IPAC KEYFOB REQUIRED 9 Tf/s 7 TB NERSC Seaborg 2/20 PB NERSC HPSS 3 Tf/s 2 TB NERSC Jacquard 10 TB IPAC Archive 20/200 TB NERSC NGF 7 Tf/s 4 TB NERSC Bassi 0.1 Tf/s 50 GB ERCSC Cluster 2 TB USPDC Cluster 0.5 Tf/s 200 GB USPDC Cluster 100 Tf/s 50 TBNERSC 5
Data Transfer • From DPCs to IPAC • transatlantic tests being planned • From IPAC to NERSC • (check networks/bandwidth with Bill Johnston) • From NGF to/from HPSS • (check bandwidth with David Skinner) • From NGF to memory (most real-time critical) • within NERSC • (check bandwidths with David Skinner) • offsite depends on location • 10Gb/s to LBL over dedicated data link on Bay Area MAN
Processing + Storage + Security + Networks IPAC KEYFOB REQUIRED 9 Tf/s 7 TB NERSC Seaborg 2/20 PB NERSC HPSS 3 Tf/s 2 TBNERSC Jacquard 10 Gb/s 10 Gb/s 10 TB IPAC Archive 20/200 TB NERSC NGF DPCs 10 Gb/s ? 10 Gb/s 10 Gb/s 7 Tf/s 4TB NERSC Bassi ? ? ? 30 Gb/s 0.1 Tf/s 50 GB ERCSC Cluster 2 TB USPDC Cluster 0.5 Tf/s 200 GBUSPDC Cluster 100 Tf/s 50 TB NERSC 5 ?