1 / 38

HENP Computing at BNL

HENP Computing at BNL. Torre Wenaus STAR Software and Computing Leader BNL RHIC & AGS Users Meeting Asilomar, CA October 21, 1999. Content. Bruce’s talk ATLAS Linux Mock Data Challenges D0 focus on areas really changing the scale of HENP comp at BNL Mount’s APOGEE talk Security

arleen
Download Presentation

HENP Computing at BNL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HENP Computing at BNL Torre Wenaus STAR Software and Computing Leader BNL RHIC & AGS Users Meeting Asilomar, CA October 21, 1999

  2. Content • Bruce’s talk • ATLAS • Linux • Mock Data Challenges • D0 • focus on areas really changing the scale of HENP comp at BNL • Mount’s APOGEE talk • Security • Software ‘attracting good people’ • ROOT; Phenix’s online threaded • Objectivity, MySQL • RIKEN comp center • Esnet • Open Science RHIC/AGS Users Meeting 10/99

  3. Historical Perspective • Prior to RHIC, BNL has hosted many small to modest scale AGS experiments • With RHIC, BNL moves into realm of large collider detectors • computing task at a scale similar to SLAC, Fermilab, CERN etc. • Has required a dramatic change in scale of HENP computing at BNL • RHIC Computing Facility (RCF) established Feb 1997 to supply primary (non-simulation) RHIC computing needs • Successful operations in two ‘Mock Data Challenge’ production stress tests and in summer 1999 engineering run • First physics run in early 2000 • Presence of RCF a strong factor in the selection of BNL as the principal US computing site for the CERN LHC ATLAS experiment • Requirements and computing plan similar to RCF • Will operate in close coordination with RCF • LHC and ATLAS operations begin in 2005 RHIC/AGS Users Meeting 10/99

  4. This Talk • Will focus on the major growth of HENP computing as a BNL activity brought by these new programs • RHIC computing at BNL • ATLAS computing at BNL • Brief mention of some other programs • Conclusions • Thanks to Bruce Gibbard, RHIC computing facility head, and others (indicated on slides) for materials RHIC/AGS Users Meeting 10/99

  5. RHIC Computing at RCF • Four experiments: PHENIX, STAR, PHOBOS, BRAHMS • 4:4:2:1 relative scales of computing task • Aggregate raw data recording rate of ~60 MBytes/sec • Annual raw data volume ~600 TBytes • NB. Size of global WWW content estimated at 7 Tbytes • Event reconstruction: 13,000 SPECint95 (450MHz PC = 18 SPECint95) • Event filtering (data mining) and physics analysis: 7,000 SPECint95 • ‘mining’ interesting data off of tape for physics analyses • aggregate access rates of ~200 MBytes/sec • iterative, interactive analysis of disk-based data by hundreds of users • aggregate access rates of ~1000 MBytes/sec • Software development and distribution • 100’s of developers; many 100k lines of code per experiment • RCF is primary development and distribution (AFS) site RHIC/AGS Users Meeting 10/99

  6. Computing Strategies • Extensive use of community/commercial/commodity products • hardware and software • increasing use of open software (eg. Linux, MySQL database) • Exploit ‘embarrassingly parallel’ nature of HENP computing • farms of loosely coupled processors (Linux PCs on Ethernet) • limited use of Sun machines for I/O intensive analysis • Hierarchical storage management (disk + tape robot/shelf) and flexible partitioning of event data based on access characteristics • optimize storage cost and access latencies to interesting data • Extensive use of OO software technologies • adopted by all four RHIC experiments, ATLAS, other BNL HENP software efforts (eg. D0), and virtually all other forthcoming expts • primarily C++; some Java • Object I/O: Objectivity commercial OO database and ROOT community (CERN) developed tool RHIC/AGS Users Meeting 10/99

  7. Event Data Storage and Management • Major software challenge: event data storage and management • ROOT: HENP community tool (from CERN) • used by all RHIC experiments for event data storage • Objectivity: Commercial object database • Used by PHENIX for conditions database • RCF did Linux port • Relational databases (MySQL, ORACLE) • Many cataloguing applications in experiments, RCF • MySQL developed by STAR as complement to ROOT for event store, replacing Objectivity • Grand Challenge Architecture • Managed access to HPSS-resident data, particularly for data mining • LBNL-led with ANL, BNL participation; deployment at RCF • Particle Physics Data Grid: transparent wide-area data processing • US HENP ‘Next Generation Internet’ project, primarily LHC directed • RCF/RHIC will act as early testbed RHIC/AGS Users Meeting 10/99

  8. ATLAS Computing at BNL • AToroidal LHC ApparatuS • One of 4 experiments at LHC • 14 TeV pp collider • ATLAS computing at CERN estimated to be >10 times that of RHIC • Augmented by regional centers outside CERN • Total scale similar to CERN installation • US ATLAS will have one primary ‘Tier 1’ regional center, at BNL • ~20% of CERN facility; ~2x RCF • BNL also manages the US ATLAS construction project; ~20% of full ATLAS detector • Simulation, data mining, physics analysis, and software development will be primary missions of the BNL Tier 1 center RHIC/AGS Users Meeting 10/99

  9. ATLAS: Commonality and Synergy with RHIC • Qualitative requirements and Tier 1 quantitative requirements similar to RCF • Exploit economies of scale in hardware and software • Share technical expertise • Learn from and build on RHIC computing as a ‘real world testbed’ • Commonality: • Complete coincidence of supported platforms • Intel/Linux processor farms, Sun/Solaris • Objectivity -- and shared concerns over Objectivity! • HPSS -- and shared concerns over HPSS! • Data mining, Grand Challenge • ROOT as an interim analysis tool • Particle Physics Data Grid RHIC/AGS Users Meeting 10/99

  10. Current Status • RHIC RCF • Hardware for first year physics in place, except for some tape store hardware (5 drives; IBM server upgrades) • Extensive testing and tuning to be done • performance, reliability, robustness • All year 1 requirements satisfied except for disk capacity (later augmentation an option; not critically needed now) • In production use by experiments • Positive review by Technical Advisory Committee just concluded • US ATLAS Tier 1 center • Initial facility in place, usage by US ATLAS ramping up • Operating out of RCF • ATLAS software installed and operating • More hardware on the way; further increases at proposal stage • Dedicated manpower ramping up RHIC/AGS Users Meeting 10/99

  11. Conclusions • RHIC and RCF have brought BNL to the forefront of HENP computing • Computing scale, imminent operation, mainstream approaches and community involvement make RHIC computing an important testbed for today’s technologies and a stepping stone to the next generation • Performance to date gives confidence for RHIC operations • Strong software efforts at BNL in the experiments • BNL as host of US ATLAS Tier 1 center will be a leading HENP computing center in the years to come • Leveraging the facilities, expertise and experience of RCF and the RHIC program • Facility installation to be complemented by a software development effort integrated with the local US ATLAS group • Programs well supported by Brookhaven as part of an increased attention to scientific computing at the lab • Lots of potential for involvement! RHIC/AGS Users Meeting 10/99

  12. RIKEN QCDSP Parallel Computer • Special purpose massively parallel machine based on DSPs for quantum field theory calculations • 4D mesh with nearest neighbor connections • 12,288 node, 600 Gflops • Custom designed and built • Collaboration centered at Columbia RIKEN BNL Research Center • 192 mother boards, 64 processors each RHIC/AGS Users Meeting 10/99

  13. CDIC - Center for Data Intensive Computing • Newly established BNL Center developing collaborative projects • Close ties to SUNY at Stony Brook • Some of the HENP projects proposed or begun • RHIC Visualization • Newly established collaboration with Stony Brook to develop dynamic 3D visualization tools for RHIC interactions and `beam’s eye’ view • RHIC Computing • Proposed collaboration with IBM to use idle PC cycles for RHIC physics simulation (generator level) • Data Mining • New project studying application of `rough sets’ data mining concepts to RHIC event classification and feature extraction • Accelerator Design • Proposed parallel simulation of beam dynamics for accelerator design and optimization RHIC/AGS Users Meeting 10/99

  14. Visualization • RHIC Au-Au collision animation • (Quicktime movie available on web) • PHENIX event simulation RHIC/AGS Users Meeting 10/99

  15. ESnet Utilization RHIC/AGS Users Meeting 10/99

  16. Open Software/Open Science Conference • BNL Oct 2, 1999 • Educate scientists on open source projects • Stimulate open source applications in science • Present science applications to open source developers RHIC/AGS Users Meeting 10/99

  17. HENP Computing Challenges Craig Tull, LBNL RHIC/AGS Users Meeting 10/99

  18. STAR at RHIC • RHIC: Relativistic Heavy Ion Collider at Brookhaven National Laboratory • Colliding Au - Au nuclei at 200GeV/nucleon • Principal objective: Discovery and characterization of the Quark Gluon Plasma • Additional spin physics program in polarized p - p • Engineering run 6-8/99; first year physics run 1/00 • STAR experiment • One of two large ‘HEP-scale’ experiments at RHIC, >400 collaborators each (PHENIX is the other) • Heart of experiment is a Time Projection Chamber (TPC) drift chamber (operational) together with Si tracker (year 2) and electromagnetic calorimeter (staged over years 1-3) • Hadrons, jets, electrons and photons over large solid angle RHIC/AGS Users Meeting 10/99

  19. The STAR Computing Task • Data recording rate of 20MB/sec; ~12MB raw data per event (~1Hz) • ~4000+ tracks/event recorded in tracking detectors (factor of 2 uncertainty in physics generators) • High statistics per event permit event by event measurement and correlation of QGP signals such as strangeness enhancement, J/psi attenuation, high Pt parton energy loss modifications in jets, global thermodynamic variables (eg. Pt slope correlated with temperature) • 17M Au-Au events (equivalent) recorded in nominal year • Relatively few but highly complex events requiring large processing power • Wide range of physics studies: ~100 concurrent analyses in ~7 physics working groups RHIC/AGS Users Meeting 10/99

  20. RHIC/STAR Computing Facilities • Dedicated RHIC computing center at BNL, the RHIC Computing Facility • Data archiving and processing for reconstruction and analysis • Three production components: Reconstruction (CRS) and analysis (CAS) services and managed data store (MDS) • 10,000 (CRS) + 7,500 (CAS) SpecInt95 CPU • ~50TB disk, 270TB robotic tape, 200MB/s I/O bandwidth, managed by High Performance Storage System (HPSS) developed by DOE/commercial consortium (IBM et al) • Current scale: ~2500 Si95 CPU, 3TB disk for STAR • Limited resources require the most cost-effective computing possible • Commodity Intel farms (running Linux) for all but I/O intensive analysis (Sun SMPs) • Smaller outside resources: • Simulation, analysis facilities at outside computing centers • Limited physics analysis computing at home institutions RHIC/AGS Users Meeting 10/99

  21. Implementation of RHIC Computing ModelIncorporation of Offsite Facilities Berkeley SP2 T3E HPSS Tape store Japan MIT Many universities, etc. Doug Olson, LBNL RHIC/AGS Users Meeting 10/99

  22. HENP Computing: Today’s Realities • Very Large Data Volumes • Large, Globally Distributed Collaborations • Long Lived Projects (>15 years) • Large (1-2M LOC), Complex Analyses • Distributed, Heterogeneous Systems • Very Limited Computing Manpower • Most Computing Manpower are not Professionals • Not necessarily a bad thing! Good understanding and direct interest among developers in the problem • Reliance on Open and Commercial Software & Standards • Evolving Computer Industry & Technology RHIC/AGS Users Meeting 10/99

  23. Event Data Storage • Management of Petabyte data volumes arguably the most difficult task in HENP computing today • Solutions must map effectively onto OO software technology • Intensive community effort in Object Database technology in last 5 years • Focus on Objectivity, the only commercial product that scales to PBytes • Great early promise; strong potential to minimize in-house development and match well the OO architecture of experiments • Reality has been more difficult: development effort much greater than expected, and mixed results on scalability • In parallel with Objectivity, community solutions have also been developed • Particularly, ROOT system from CERN supporting I/O of C++ based object models • When complemented by a relational database, provides a robust and scalable solution that integrates well with experiment software • The jury is still out • STAR and some other experiments have dropped Objectivity in favor of ROOT+RDBMS • BaBar at SLAC is in production with Objectivity, and is working through the problems RHIC/AGS Users Meeting 10/99

  24. Data Management • Coupled to the event data storage problem, but distinct, is the problem of managing effective archiving and retrieval of the data • Hierarchical storage management system required, capable of managing • Terabytes of disk-resident rapid-access data • Petabytes of tape-resident data with medium latency access • Industry offers very few solutions today • One (only) has been identified: HPSS • Deployed at RCF (and many other sites), successfully but with caveats • Demands high manpower levels for development and 24x7 support • Still under development, particularly in HENP applications, with stability and robustness issues • Community HENP solutions under development in this area as well (Fermilab, DESY) RHIC/AGS Users Meeting 10/99

  25. Distributed Computing • In current generation experiments such as RHIC, and to a much greater degree in the next generation such as LHC, distributed computing is essential • Fully empowering physicists not at the experimental site to participate in development and analysis, with effective access to the data • Distributing the computing and data management task among several large sites • The central site can no longer afford to support computing on its own • Near and long term efforts underway to address the need • eg. NOVA project at BNL (Networked Object-based enVironment for Analysis): small project to address immediate and near term needs (STAR/RHIC, ATLAS, possibly others) • Large, LHC directed projects such as the Particle Physics Data Grid project and the MONARC regional center modelling project RHIC/AGS Users Meeting 10/99

  26. Computing Requirements • Nominal year processing and data volume requirements: • Raw data volume: 200TB • Reconstruction: 2800 Si95 total CPU, 30TB DST data • 10x event size reduction from raw to reco • 1.5 reconstruction passes/event assumed • Analysis: 4000 Si95 total analysis CPU, 15TB micro-DST data • 1-1000 Si95-sec/event per MB of DST depending on analysis • Wide range, from CPU-limited to I/O limited • ~100 active analyses, 5 passes per analysis • micro-DST volumes from .1 to several TB • Simulation: 3300 Si95 total including reconstruction, 24TB • Total nominal year data volume: 270TB • Total nominal year CPU: 10,000 Si95 RHIC/AGS Users Meeting 10/99

  27. STAR Computing Facilities: RCF • Data archiving and processing for reconstruction and analysis (not simulation; done offsite) • General user services (email, web browsing, etc.) • Three production components: Reconstruction and analysis services (CRS, CAS) and managed data store (MDS) • Nominal year scale: • 10,000 (CRS) + 7,500 (CAS) SpecInt95 CPU • Intel farms running Linux for almost all processing; limited use of Sun SMPs for I/O intensive analysis • Cost-effective, productive, well-aligned with the HENP community • ~50TB disk, 270TB robotic tape, 200MB/s, managed by HPSS • Current scale (when new procurements are in place): • ~2500 Si95 CPU, 3TB disk for STAR • ~8TB of data currently in HPSS RHIC/AGS Users Meeting 10/99

  28. Computing Facilities • Dedicated RHIC computing center at BNL, the RHIC Computing Facility • Data archiving and processing for reconstruction and analysi • Simulation done offsite • 10,000 (reco) + 7,500 (analysis) Si95 CPU • Primarily Linux; some Sun for I/O intensive analysis • ~50TB disk, 270TB robotic tape, 200MB/s, managed by HPSS • Current scale (STAR allocation, ~40% of total): • ~2500 Si95 CPU • 3TB disk • Support for (a subset of) physics analysis computing at home institutions RHIC/AGS Users Meeting 10/99

  29. Mock Data Challenges • MDC1: Sep/Oct ‘98 • >200k (2TB) events simulated offsite; 170k reconstructed at RCF (goal was 100k) • Storage technologies exercised (Objectivity, ROOT) • Data management architecture of Grand Challenge project demonstrated • Concerns identified: HPSS, AFS, farm management software • MDC2: Feb/Mar ‘99 • New ROOT-based infrastructure in production • AFS improved, HPSS improved but still a concern • Storage technology finalized (ROOT) • New problem area, STAR program size, addressed in new procurements and OS updates (more memory, swap) • Both data challenges: • Effective demonstration of productive, cooperative, concurrent (in MDC1) production operations among the four experiments • Bottom line verdict: the facility works, and should perform in physics datataking and analysis RHIC/AGS Users Meeting 10/99

  30. Offline Software Environment • Current software base a mix of Fortran (55%) and C++ (45%) • from ~80%/20% (~95%/5% in non-infrastructure code) in 9/98 • New development, and all post-reco analysis, in C++ • Framework built over ROOT adopted 11/98 • Origins in the ‘Makers’ of ATLFAST • Supports legacy Fortran codes, table (IDL) based data structures developed in previous StAF framework without change • Deployed in offline production and analysis in our ‘Mock Data Challenge 2’, 2-3/99 • Post-reconstruction analysis: C++/OO data model ‘StEvent’ • StEvent interface is ‘generic C++’; analysis codes are unconstrained by ROOT and need not (but may) use it • Next step: migrate the OO data model upstream to reco RHIC/AGS Users Meeting 10/99

  31. Initial RHIC DB Technology Choices • A RHIC-wide Event Store Task Force in Fall ‘97 addressed data management alternatives • Requirements formulated by the four experiments • Objectivity and ROOT were the ‘contenders’ put forward • STAR and PHENIX selected Objectivity as the basis for data management • Concluded that only Objectivity met the requirements of their event stores • ROOT selected by the smaller experiments and seen by all as analysis tool with great potential • Issue for the two larger experiments: • Where to draw a dividing line between Objectivity and ROOT in the data model and data processing RHIC/AGS Users Meeting 10/99

  32. Event Store Requirements -- And Fall ‘97 View RHIC/AGS Users Meeting 10/99

  33. Requirements: STAR 8/99 View (My Version) RHIC/AGS Users Meeting 10/99

  34. RHIC Data Management: Factors For Evaluation • My perception of changes in the STAR view from ‘97 to now are shown • Objy Root+MySQL Factor •   Cost •   Performance and capability as data access solution •   Quality of technical support •   Ease of use, quality of doc •  Ease of integration with analysis •  Ease of maintenance, risk •  Commonality among experiments •  Extent, leverage of outside usage •  Affordable/manageable outside RCF •  Quality of data distribution mechanisms •  Integrity of replica copies •  Availability of browser tools •  Flexibility in controlling permanent storage location •  Level of relevant standards compliance, eg. ODMG •  Java access •  Partitioning DB and resources among groups RHIC/AGS Users Meeting 10/99

  35. Object Database: Storage Hierarchy vs User View User deals only with ‘object model’ of his own design; storage details are hidden RHIC/AGS Users Meeting 10/99

  36. ATLAS and US ATLAS • One of two large HEP experiments at CERN’s Large Hadron Collider (LHC) • Proton-proton collider; 14 TeV in center of mass • 1 billion events/year • Principal objective: Discovery and characterization of physics ‘beyond the Standard Model’: Higgs, Supersymmetry, … • Startup 2005+ • Brookhaven hosts the US Project Office for US contributions to ATLAS ~$170M; about 20% of the project • Brookhaven recently selected as host lab for US ATLAS Computing and site of US Regional Center • Extension of RHIC Computing Facility • US ATLAS Computing projected to grow to ~$15M/yr RHIC/AGS Users Meeting 10/99

  37. Conclusions • HENP is (unfortunately!) still pushing the envelope in the scale of the data processing and management tasks of present and next generation experiments • The HENP community has looked to the commercial and open software worlds for tools and approaches, with strong successes in some areas (OO programming), qualified successes in others (HPSS), and the jury is still out on some (Object Databases) • Moore’s Law and the rise of Linux have made provisioning CPU cycles less of an issue • The community has converged on OO as the principal tool to make software development tractable • But solutions to data storage and management are much less clear • A need on the rise is distributed computing, but internet-driven growth in capacities and technologies will be a strong lever • Developments within the HENP community continue to be important, either as fully capable solutions or interim solutions pending further commercial/open software developments RHIC/AGS Users Meeting 10/99

  38. Conclusions • The circumstances of STAR • Startup this year • Slow start in addressing event store implementation, C++ migration • Large base of legacy software • Extremely limited manpower and computing resources • drive us to very practical and pragmatic data management choices • Beg, steal and borrow from the community • Deploy community and industry standard technologies • Isolate implementation choices behind standard interfaces, to revisit and re-optimize in the future • which leverage existing STAR strengths • Component and standards-based software greatly eases integration of new technologies • preserving compatibility with existing tools for selective and fall-back use • while efficiently migrating legacy software and legacy physicists • After some course corrections, we have a capable data management architecture for startup that scales to STAR’s data volumes • … but Objectivity is no longer in the picture. RHIC/AGS Users Meeting 10/99

More Related