1 / 13

Nuclear Physics Data Management Needs Bruce G. Gibbard

Nuclear Physics Data Management Needs Bruce G. Gibbard. SLAC DMW2004 Workshop 16-18 March 2004. Overview. Addressing a class of Nuclear Physics (NP) experiments utilizing large particle detector systems to study accelerator produced reactions Examples at: BNL (RHIC), JLab, CERN (LHC)

Download Presentation

Nuclear Physics Data Management Needs Bruce G. Gibbard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nuclear Physics Data Management NeedsBruce G. Gibbard SLAC DMW2004 Workshop 16-18 March 2004

  2. Overview • Addressing a class of Nuclear Physics (NP) experiments utilizing large particle detector systems to study accelerator produced reactions • Examples at: BNL (RHIC), JLab, CERN (LHC) • Technologies & data management needs of this branch of NP are quite similar to HEP • Integrating across its four experiments, the Relativistic Heavy Ion Collider (RHIC) at BNL is currently the most prolific producer of data • Study of very high energy collisions of heavy ions (up to Au on Au) • High nucleon count, high energy => high multiplicity • High multiplicity, high luminosity and fine detector granularity => very high data rates • Raw data recording at up to ~250 MBytes/sec B. Gibbard

  3. Digitized Event In STAR at RHIC B. Gibbard

  4. IT Activities of Such NP Experiments • Support the basic computing infrastructure for experimental collaboration • Typically large, 100’s of physicist, and internationally distributed • Manage & distribute code, design, cost, & schedule databases • Facilitate communication, documentation and decision making • Store, process, support analysis of, and serve data • Online recording of Raw data • Generation and recording of Simulated data • Construction of Summary data from Raw and Simulated data • Iterative generation of Distilled Data Subsets from Summary data • Serve Distilled Data Subsets and analysis capability to widely distributed individual physicists    Data Intensive Activities B. Gibbard

  5. Data Handling Limited B. Gibbard

  6. Data Volumes in Current RHIC Run • Raw Data (PHENIX) • Peak rates to 120 MBytes/sec • First 2 months of ’04, Jan & Feb • 109 Events • 160 TBytes • Project ~ 225 TBytes of Raw data for Current Run • Derived Data (PHENIX) • Construction of Summary Data from Raw Data then production of distilled subsets from that Summary Data • Project ~270 TBytes of Derived data • Total (all of RHIC) = 1.2 PBytes for Current Run • STAR = PHENIX • BRAHMS + PHOBOS = ~ 40% of PHENIX B. Gibbard

  7. RHIC Raw Data Recording Rate 120MBytes/sec PHENIX 120MBytes/sec STAR B. Gibbard

  8. Current RHIC Technology • Tertiary Storage • StorageTek / HPSS • 4 Silos – 4.5 PBytes (1.5 PBytes currently filled) • 1000 MB/sec theoretical native I/O bandwidth • Online Storage • Central NFS served disk • ~170 TBytes of FibreChannel Connected RAID 5 • ~1200 MBytes/sec served by 32 SUN SMP’s • Distributed disk • ~300 TBytes of SCSI/IDE • Locally mounted on Intel/Linux farm nodes • Compute • ~1300 Dual Processor Red Hat Linux / Intel Nodes • ~2600 CPU’s => ~1,400 kSPECint2K (3-4 TFLOPS) B. Gibbard

  9. Projected Growth in Capacity Scale • Moore’s Law effect of component replacement in experiment DAQ’s & in computing facilities => ~X6 increase in 5 years • Not yet fully specified requirements of RHIC II and eRHIC upgrades are likely to accelerate growth Disk Volume at RHIC  B. Gibbard

  10. NP Analysis Limitations (1) • Underlying the Data Management issue • Events (interactions) of interest are rare relative to minimum bias events • Threshold / phase space effect for each new energy domain • Combinatorics of large multiplicity events of all kinds confound selection of interesting events • Combinatorics also create backgrounds to signals of interest • Two analysis approaches • Topological: typically with • Many qualitative &/or quantitative constraints on data sample • Relatively low background to signal • Modest number of events in final analysis data sample • Statistical: frequently with • More poorly constrained sample • Large background (signal is small difference between large numbers) • Large number of events in final analysis data sample B. Gibbard

  11. NP Analysis Limitations (2) • It seems that it is less frequently possible to do Topological Analyses in NP than in HEP so Statistical Analyses are more often required • Evidence for this is rather anecdotal – not all would agree • To the extent that it is true, final analysis data sets tend to be large • These are the data sets accessed very frequently by large numbers of users … thus exacerbating the data management problem • In any case the extraction and the delivery of distilled data subsets to physicists for analysis currently most limits NP analyses B. Gibbard

  12. Grid / Data Management Issues • Major RHIC experiments are moving (have moved) complete copies of Summary Date to regional analysis centers • STAR: to LBNL via Grid Tools • PHENIX: to Riken via Tape/Airfreight • Evolution toward more sites and full dependence on Grid • RHIC, JLab, and NP at the LHC are all very interested and active in Grid development • Including high performance reliable Wide Area data movement / replication / access services B. Gibbard

  13. Conclusions • NP and HEP accelerator/detector experiments have very similar Data Management requirements • NP analyses of this type currently tend to be more Data than CPU limited • “Mining” of Summary Data and affording end users adequate access (both Local and Wide Area) to the resulting distillate currently most limits NP analysis • It is expected that this will remain the case for the next 4-6 years through • Upgrades of RHIC and Jlab • Start-up of LHC with Wide Area access growing in importance relative to Local access B. Gibbard

More Related