310 likes | 449 Views
DOE/NSF Review – January 2003, LBNL. ATLAS Software & Computing Status and Plans. Dario Barberis University of Genoa (Italy). Foreword.
E N D
DOE/NSF Review – January 2003, LBNL ATLAS Software & ComputingStatus and Plans Dario Barberis University of Genoa (Italy) Dario Barberis – Università e INFN, Genova
Foreword • I have been designated by the ATLAS Management to be the next Computing Coordinator, and the ATLAS Collaboration Board has been asked to endorse this proposal (e-mail vote by Collaboration Board in process) • Main parts of this talk were prepared with contributions of the outgoing Computing Coordinator, N. McCubbin, and several other members of the Computing Steering Group • Organizational changes outlined at the end of this talk are still proposals being discussed within the ATLAS Collaboration Dario Barberis – Università e INFN, Genova
Outline • Data Challenges • GRID • Geant4 • LCG • Computing Organization • Software development plans Dario Barberis – Università e INFN, Genova
DC0: readiness & continuity tests(December 2001 – June 2002) • “3 lines” for “full” simulation • 1) Full chain with new geometry (as of January 2002) Generator->(Objy)->Geant3->(Zebra->Objy)->Athena recon.->(Objy)->Analysis • 2) Reconstruction of ‘Physics TDR’ data within Athena (Zebra->Objy)->Athena rec.-> (Objy) -> Simple analysis • 3) Geant4 robustness test Generator-> (Objy)->Geant4->(Objy) • “1 line” for “fast” simulation Generator-> (Objy) -> Atlfast -> (Objy) Continuity test: Everything from the same release for the full chain (3.0.2) • we learnt a lot • we underestimated the implications of that statement • completed in June 2002 Dario Barberis – Università e INFN, Genova
ATLAS Computing: DC1 • The ‘Phase1’ (G3) simulation (Jul-Aug 2002) was a highly successful world-wide exercise from which we learned a lot, e.g. software distribution, importance of validation, etc. • Grid tools were used in Scandinavia (‘NorduGrid’) for their full share of DC1, and in USA for a significant fraction of theirs. Grid tools have also been used for an extensive ATLAS-EDG test involving 6 sites, aimed at repeating ~1% of ‘European’ DC1 share. • We have launched end Nov 2002 ‘Phase 2’, i.e. the “pile-up” (2x1033 and 1034) exercise, following ‘site validation’ (55 sites) and ‘physics validation’. The HLT community specifies details of what samples are to be piled up. Most sites completed by mid-December, last few jobs running right now. • About the same CPU neeed as for phase 1 • 70 Tbyte, 100 000 files • Additional countries/institutes joined in • Large scale GRID test in since end November in preparation for reconstruction • Reconstruction February-March 2003 • using ATHENA. CPU needed <10% than in simulation, but 30 TB of data collected in 7 simulation production sites. Dario Barberis – Università e INFN, Genova
ATLAS DC1 Phase 1: July-August 2002 3200 CPU‘s 110 kSI95 71000 CPU days 39 Institutes in 18 Countries • Australia • Austria • Canada • CERN • Czech Republic • France • Germany • Israel • Italy • Japan • Nordic • Russia • Spain • Taiwan • UK • USA grid tools used at 11 sites 5*10*7 events generated 1*10*7 events simulated 3*10*7 single particles 30 Tbytes 35 000 files Dario Barberis – Università e INFN, Genova
ATLAS Computing: DC1 WGs & people (under the responsibility of the Data Challenge Coordinator, G.Poulard) • A-Wp11: Tools: • Bookkeeping & cataloguing (S. Albrand, L. Goossens + 7 other physicists/engineers) • ProductionWG: L. Goossens, P. Nevski, S. Vaniachine • + Virtual Data catalog (S. Vaniachine, P. Nevski) • + Grid Tools providers (NorduGrid & US) • Organisation & DocumentationWG: A. Nairz,N. Benekos+ AMI and Magda people (in close connection with bookkeeping & cataloguing WG) • A-Wp12: Teams • "Site" validation (J-F. Laporte) • All local managers from collaborating institutes • Physics Validation (J-F. Laporte, F. Gianotti + representatives of HLT and Physics WG’s) • Production • WG: P. Nevski, S. O'Neale, L. Goossens,Y. Smirnov, S. Vaniachine • + local production managers • (39 sites for DC1/1 and 56 sitesfor DC1/2) • + ATLAS-Grid people • A-Wp1: Event Generator (I. Hinchliffe + 8 physicists) • A-Wp2: Geant3 Simulation (P. Nevski) • A-Wp3: Geant4 Simulation (A. Dell'Acqua) • A-Wp4: Pile-up (M. Wielers) • "Atlsim" framework (P. Nevski) • "Athena" framework (P. Calafiura) • A-Wp5: Detector response • (not active for DC1) • A-Wp6: Data Conversion (RD Schaffer + DataBase group) • Additional people were active for DC0 • + people involved in AthenaRootI/O conversion • A-Wp7: Event Filtering (M. Wielers) • A-Wp8: Reconstruction (D. Rousseau) • A-Wp9: Analysis (F. Gianotti) • A-Wp10: Data Management (D. Malon) • A-Wp13: Tier centres (A. Putzer) • WG: responsible of production centers • + contact person in each country • A-Wp14: Fast simulation (P. Sherwood) • WG: E. Richter-Was, J. Couchman Success of DC1 due to effort and commitment of many world-wide sites, actively organized by A. Putzer Dario Barberis – Università e INFN, Genova
ATLAS Computing: DC1 • Currently we are preparing (validating) the Athena-based reconstruction step: Software Release 6 (end January). Aim is that we can launch wide-scale reconstruction a.s.a.p. after Release 6, possibly with wide use of some GRID tools. [The actual reconstruction, which will probably be (re-)done on various sub-samples over the first few months of next year is not strictly part of DC1.] • Note that our present scheduling of software releases is driven entirely by HLT (High Level Trigger) requirements and schedule. For example, when Release 5 slipped in Fall 2002 by ~1 month compared to the original schedule, we issued two intermediate releases (adding ‘ByteStream’ [raw data format] capability) to minimise effects of delay on HLT schedule. Dario Barberis – Università e INFN, Genova
ATLAS Computing: DC1/HLT/EDM • In fact, one of the most important benefits of DC1 has been the much enhanced collaboration between the HLT and ‘off-line’ communities, most prominently in the development of the raw-data part of the Event Data Model. (‘ByteStream’, Raw Data Objects, etc.) • We have not yet focussed on the reconstruction part of the Event Data Model to the same extent, but an assessment of what we have got ‘today’ and (re-)design where appropriate is ongoing. Dario Barberis – Università e INFN, Genova
DC2-3-4-… • DC2: Q4/2003 – Q2/2004 • Goals • Full deployment of Event Data Model & Detector Description • Geant4 becomes the main simulation engine • Pile-up in Athena • Test the calibration and alignment procedures • Use LCG common software • Use widely GRID middleware • Perform large scale physics analysis • Further tests of the computing model • Scale • As for DC1: ~ 107 fully simulated events (pile-up too) • DC3, DC4... • yearly increase in scale and scope • increasing use of Grid • testing rate capability • testing physics analysis strategy Dario Barberis – Università e INFN, Genova
ATLAS and GRID • Atlas has already used GRID for producing DC1 simulations • Production distributed on 39 sites, GRID used for ~5% of the total amount of data by: • NorduGrid (8 sites), who produced all their data using GRID • US Grid Testbed (Arlington, LBNL, Oklahoma), where GRID was used for ~10% of their DC1 share (10%=30k hours) • EU-DataGrid re-ran 350 DC1 jobs (~ 10k hours) in some Tier1 prototype sites: CERN, CNAF (Italy), Lyon, RAL, NIKHEF e Karlsruhe (CrossGrid site): this last production was done in the first half of September and was made possible by the work of the ATLAS-EDG task force Dario Barberis – Università e INFN, Genova
ATLAS GRID plans for the near future • In preparation for the reconstruction phase (spring 2003) we performed further Grid tests in Nov/Dec. • Extend the EDG to more ATLAS sites, not only in Europe. • Test a basic implementation of a worldwide Grid. • Test the inter-operability between the different Grid flavors. • Inter-operation = submit a job in region A, the job is run in region B if the input data are in B; the produced data are stored; the job log is made available to the submitter. • The EU project DataTag has a Work Package devoted specifically to interoperation in collaboration with US iVDGL project: the results of the work of these projects is expected to be taken up by LCG (GLUE framework) • ATLAS has collaborated with DataTag-iVDGL for interoperability demonstrations in November-December 2002. • The DC1 data will be reconstructed (using Athena) early 2003: the scope and way of using Grids for distributed reconstruction will depend on the results of the tests started in Nov/December and still on-going. • ATLAS is fully committed to LCG and to its Grid middleware selection process: our “early tester” role has been recognized to be very useful for EDG: we are confident that it will be the same for LCG products Dario Barberis – Università e INFN, Genova
ATLAS Long Term GRID Planning • Worldwide GRID tests are essential to define in detail the ATLAS distributed Computing Model. • The principles of the cost and resource sharing are described in a paper and were presented in the last ATLAS week (October 2002) and endorsed by the ATLAS Collaboration Board: PRINCIPLES OF COST SHARING FOR THE ATLAS OFFLINECOMPUTING RESOURCES Prepared by: R. Jones, N. McCubbin, M. Nordberg, L. Perini, G. Poulard, and A. Putzer • Main implementation of cost sharing is foreseen through in-kind contributions of resources in regional centres, made available for the common ATLAS computing infrastructure Dario Barberis – Università e INFN, Genova
ATLAS Computing: Geant4 evaluation and integration programme • ATLAS has invested and is investing substantial effort into evaluation of G4, in close collaboration with G4 itself • Involves essentially all ATLAS sub-detectors • Provides reference against which any future simulation will have to compare • Provides (sufficiently well-) tested code that should, in principle, integrate with no difficulty into a complete detector simulation suite: • Striving for: • Minimal inter-detector coupling • Minimal coupling between framework and users code. • With this approach we are finding no problem in interfacing different detectors • Further integration issues (framework, detector clashes, memory, performance) are being checked. Dario Barberis – Università e INFN, Genova
2 0 -2 -4 -6 2 0 -2 -4 GEANT4 GEANT4 -6 GEANT3 GEANT3 data data 0 1 2 3 4 5 GEANT4 GEANT4 GEANT3 GEANT3 0.2 0.3 0.4 0.5 9 9.2 9.4 9.6 Example: Geant4 Electron Response in ATLAS Calorimetry • Overall signal characteristics: • Geant4 reproduces the average electron signal as func- • tion of the incident energy in all ATLAS calorimeters • very well (testbeam setup or analysis induced non-line- • arities typically within ±1%)… • …but average signal • can be smaller than in G3 • and data (1-3% for 20- • 700 μm range cut in HEC); • signal fluctuations in EMB • very well simulated; • electromagnetic FCal: • high energy limit of reso- • lution function ~5% in G4, • ~ 4% in data and G3; FCal Electron Response EMB Electron Energy Resolution ΔErec MC-Data [%] • TileCal: stochastic term • 22%GeV1/2 G4/G3, 26%GeV1/2 • data; high energy limit very • comparable. (thanks to P.Loch) Dario Barberis – Università e INFN, Genova
Conclusions on ATLAS Geant4 Physics validation • Geant4 can simulate relevant features of muon, electron and pion signals in various ATLAS detectors, often better than Geant3; • remaining discrepancies, especially for hadrons, are addressed and progress can be expected in the near future; • ATLAS has a huge amount of the right testbeam data for the calorimeters, inner detector modules, and the muon detectors to evaluate the Geant4 Physics models in detail; • feedback loops to Geant4 team are for most systems established since quite some time; communication is not a problem. Dario Barberis – Università e INFN, Genova
G4 simulation of full ATLAS detector • DC0 (end 2001): robustness test with complete Muons, simplified InDet and Calorimeters • 105 events, no crash! • Now basically all detectors available • Some parts of the detectors (dead material, toroids) are not there and are being worked on • Combined simulation starting now • Full geometry usable early February • Beta version of the full simulation program to be ready end January, to be tested in realistic production. Dario Barberis – Università e INFN, Genova
ATLAS Computing: Interactions with the LCG Project • The LCG project is completely central to ATLAS computing. We are committed to it, and, in our planning, we rely on it: • Participation in RTAGs; ATLAS has provided the convenors for two major RTAGs (Persistency and Simulation); • Commitment of ATLAS effort into POOL (‘persistency’) project: • The POOL project is the ATLAS data persistency project! • LCG products and the release and deployment of the first LCG GRID infrastructure (‘LCG-1’) are now in our baseline planning: • LCG-1 must be used for our DC2 production end 2003 – early 2004 Dario Barberis – Università e INFN, Genova
ATLAS Computing organization (1999-2002) Comp. Oversight Board National Comp. Board Comp. Steering Group Physics Technical Group Event filter QA group simulation reconstruction database Arch. team simulation reconstruction database coordinator Detector system Dario Barberis – Università e INFN, Genova
Key ATLAS Computing bodies • Computing Oversight Board (COB): ATLAS Spokesperson and Deputy, Computing Coordinator, Physics Coordinator, T-DAQ Project Leader. Role: oversight, not executive. Meets ~monthly. • Computing Steering Group (CSG): Membership first row and first column of Detector/Task Matrix, plus Data Challenge Co-ordinator, Software Controller, Chief Architect, Chair NCB, GRID Coordinator. The top executive body for ATLAS computing. Meets ~monthly. • National Computing Board (NCB): Representatives of all regions and/or funding agencies, GRID-coordinator and Atlas Management ex-officio. Responsible for all issues which bear on national resources: notably provision of resources for World Wide Computing. Meets every two/three months. Dario Barberis – Università e INFN, Genova
ATLAS Detector/Task matrix( CSG members) Dario Barberis – Università e INFN, Genova
Other ATLAS key post-holders • Computing Steering Group: • Chief Architect: D.Quarrie (LBNL) • Physics Co-ordinator: F.Gianotti (CERN) • Planning Officer: T.Wenaus (BNL/CERN) • Chair NCB: A.Putzer (Heidelberg) • GRID Coordinator: L.Perini (Milan) • Data Challenge Coordinator: G.Poulard (CERN) • Software ‘Controller’: J-F.Laporte (Saclay) • Software Infrastructure Team: • Software Librarians: S.O’Neale (Birmingham), A.Undrus (BNL) • Release Co-ordinator (rotating): D.Barberis (Genoa) • Release tools: Ch.Arnault (Orsay), J.Fulachier (Grenoble) • Quality Assurance: S.Albrand (Grenoble), P.Sherwood (UCL) • LCG ATLAS representatives: • POB (Project Oversight Board): T.Åkesson (Deputy Spokesperson), J.Huth (USA), P.Eerola (Nordic Cluster), H.Sakamoto (Japan) • SC2 (Software & Computing Committee): N.McCubbin (Computing Coordinator) and D.Froidevaux • PEB (Project Execution Board): G.Poulard (Data Challenge Coordinator) • GDB (Grid Deployment Board): N.McCubbin (Computing Coordinator), G.Poulard (Data Challenge Coordinator), L.Perini (Grid Coordinator, Deputy) Dario Barberis – Università e INFN, Genova
Proposed new computing organization DRAFT FOR DISCUSSION Dario Barberis – Università e INFN, Genova
Main positions in proposed new computing organization • Computing Coordinator • Leads and coordinates the developments of ATLAS computing in all itsaspects: software, infrastructure, planning, resources. • Coordinates development activities with the TDAQ Project Leader(s), thePhysics Coordinator and the Technical Coordinator through the Executive Board and theappropriate boards (COB and TTCC). • Represents ATLAS computing in the LCG management structure (SC2 andother committees) and at LHC level (LHCC and LHC-4). • Chairs the Computing Management Board. • Software Project Leader • Leads the developments of ATLAS software, as the Chief Architect of theSoftware Project. • Is member of the ATLAS Executive Board and COB. • Participates in the LCG Architects Forum and other LCG activities. • Chairs the Software Project Management Board and the Architecture Team. Dario Barberis – Università e INFN, Genova
Main boards in proposed newcomputing organization (1) • Computing Management Board (CMB): • Computing Coordinator (chair) • Software Project Leader • TDAQ Liaison • Physics Coordinator • NCB Chair • GRID & Operations Coordinator • Planning & Resources Coordinator • Responsibilities: coordinate and manage computing activities. Setpriorities and take executive decisions. • Meetings: bi-weekly. Dario Barberis – Università e INFN, Genova
Main boards in proposed newcomputing organization (2) • Software Project Management Board (SPMB): • Software Project Leader (chair) • Computing Coordinator (ex officio) • Simulation Coordinator • Reconstruction, HLT Algorithms & Analysis Tools Coordinator(s) • Core Services Coordinator • Software Infrastructure Team Coordinator • LCG Applications Liaison • Calibration/Alignment Coordinator • Sub-detector Software Coordinators • Responsibilities: coordinate the coherent development of software (bothinfrastructure and applications). • Meetings: bi-weekly. Dario Barberis – Università e INFN, Genova
Development plan (1) • early 2003: • completion of the first development cycle of OO/C++ software: • Framework • Fast Simulation • Event Data Model • Geometry • Reconstruction • implementation of the complete simulation in Geant4 and integration Geant4/Athena • reminder: first cycle of OO development had to prove that “new s/w can do at least as well as old one” and was based on “translation” of algorithm and data structures from Fortran to C++ Dario Barberis – Università e INFN, Genova
Development plan (2) • 2003 – 2005: • Second cycle of OO software development (proper design is needed of several components): • Event Data Model and Geometry: • coherent design across all detectors and data types • optimization of data access in memory and on disk • Integrated development of alignment/calibration procedures • Development and integration of the Conditions Data Base • Simulation: • optimization of Geant4 (geometry and physics) • optimization of detector response • On-line/off-line integration: Trigger and Event Filter software • Reconstruction: development of a global strategy, based on modular interchangeable components Dario Barberis – Università e INFN, Genova
Major Milestones Dario Barberis – Università e INFN, Genova
Major Milestones Green: Done Gray: Original date Blue: Current date Dario Barberis – Università e INFN, Genova
Perspectives • This plan of action is realistic and can succeed if: • there are sufficient Human Resources • there is a “critical mass” of people working together in a few key institutions, first of all at CERN • there is general consensus on where we are heading, and by which means (not always true in the past) Dario Barberis – Università e INFN, Genova