1 / 12

What we’re doing Why we’re doing it What we’ve learned by doing it

PHENIX Offline Computing. David Morrison Brookhaven National Laboratory. What we’re doing Why we’re doing it What we’ve learned by doing it. a word from our sponsors. large collaboration (>400 physicists) large, complex detector ~300,000 channels 11 different detector subsystems

denver
Download Presentation

What we’re doing Why we’re doing it What we’ve learned by doing it

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PHENIX Offline Computing David Morrison Brookhaven National Laboratory What we’re doing Why we’re doing it What we’ve learned by doing it

  2. a word from our sponsors ... • large collaboration (>400 physicists) • large, complex detector • ~300,000 channels • 11 different detector subsystems • large volume of data, large number of events • 20 MB/sec for 9 months each year • 109 Au+Au events each year • broad physics program • partly because RHIC itself is very flexible • Au+Au at 100+100 GeV/A, spin polarized p+p, and everything in-between • muons, electrons, hadrons, photons

  3. from the PHENIX photo album DPM, in hardhat

  4. know your physics program for PHENIX, event processing rather than event selection know your constraints money, manpower ... and tape mounts avoid “not invented here” syndrome: beg, borrow, collaborate doesn’t automatically imply use of commercial products focus on modularity, interfaces, abstract base classes viciously curtail variety of architecture/OS Linux, Solaris data management and data access are really hard problems don’t rely on fine-grained random access to 100’s of TB of data everyone has their favorite reference works... Design Patterns (Gamma et al) run-time aggregation, shallow inheritance trees The Mythical Man-Month (Brooks) avoid implementation by committee the eightfold way of PHENIX offline computing

  5. small group of “core” offline developers M. Messer, K. Pope, M. Velkovsky, M. Purschke, D. Morrison, (M. Pollack) large number of computer-savvy subsystem physicists recruitment via “help wanted” list of projects that need people PHENIX object-orented library, PHOOL (see talk by M. Messer) object-oriented analysis framework analysis modules all share common interface type-safe, flexible data manager extensive use of RTTI, avoids (void *) casts by users ROOT I/O used for persistency “STL” operations on collection of modules or data nodes varied OO views on analysis framework design ranging from passive data to “event, reconstruct thyself” PHOOL follows a hybrid approach migrated to PHOOL from STAF in early 1999 no user code modified (~120,000 LOC) building blocks

  6. more blocks • lots of physics-oriented objects in PHENIX code • geometry, address/index objects, track models, reconstruction • file catalog • metadata management, tracks related files, tied in with run info DB • “data carousel” for retrieving files from HPSS • retrieval seen as group-level activity (subsystems, physics working groups) • carousel optimizes file retrieval, mediates resource usage between groups • scripts on top of IBM-written batch system • event display(s) • very much subsystem-centered efforts; all are ROOT-based • clearly valuable for algorithm development and debugging • value for PHENIX physics analysis much less clear • GNU build system, Mozilla-derived recompilation (poster M. Velkovsky) • autoconf, automake, libtool, Bonsai, Tinderbox, etc. • capable, robust, widely used by large audience on variety of platforms • feedback loop for code development

  7. Objectivity used for “archival” database needs Objy used in fairly “mainstream” manner all Objy DBs are resident online (not storing event data) autonomous partitions, data replicated between counting house, RCF RCF (D. Stampf) ported Objy to Linux PdbCal class library aimed at calibration DB application insulates typical user from Objectivity objects stored with validity period, versioning usable interactively from within ROOT mySQL used for other database applications Bonsai, Tinderbox system uses mySQL heavily used in “data carousel” databases in PHENIX

  8. simplified data flow disk NFS disk HPSS counting house analysis farm calibrations & conditions Objectivity federated DB

  9. OO ubiquitous, mainstream in PHENIX • subclasses of abstract “Eventiterator” class used to read raw data • from online pool, file, or fake test events - user code unchanged • online control architecture based on CORBA “publish-subscribe” • Java used in counting house for GUIs, CORBA • subsystem reconstruction code uses STL, design patterns • not unusual to hear “singleton”, “iterator” at computing meetings • OO emerging out of subsystems faster than from core offline crew

  10. no Fortran in new post-simulation code sidestepped many awkward F77/C++ issues, allowed OO to permeate loosely coupled, short hierarchy design working well information localization on top of information encapsulation allows decoupled, independent development no formal design tools, but lots of cloudy chalkboard diagrams usually just a few interacting classes social engineering as important as software engineering OO not science-fiction, not difficult ... and it’s here to stay lots of hands-on examples, people are usually pleasantly surprised OO experiences

  11. more OO experiences • OO was oversold (not by us!) as a computing panacea • does make big computing problem tractable, not trivial • occasional need for internal “public-relations” • cognizance of “distance” between concepts advocated by developers and those held by users • e.g., CORBA IDL a great thing; tough to sell to collaboration at-large • takes time and effort to “get it”, to move beyond “F77++” • general audience OO and C++ tutorials have helped • also work closely with someone from each subsystem - helps the OO “meme” take hold

  12. summary • PHENIX computing is essentially ready for physics data • use of PHOOL proven very successful during “mock data challenge” • ObjectivityDB is primary database technology used throughout PHENIX • reasonably conventional file-oriented data processing model • loosely coupled, shallow hierarchy OO design • common approach across online and offline computing • several approaches to recruiting, stretching scarce manpower • deliberate, explicit choice by collaboration to move to OO • recruit manpower from detector subsystems • loosely coupled OO design aids loosely coupled development • OO has slowed implementation, but has been indispensable for design • PHENIX will analyze physics data because of OO, not in spite of it

More Related