1 / 36

ATLAS Collaboration

ATLAS Collaboration. Invited talk at ACAT’2002, Moscow, Russia June 25, 2002 Alexandre Vaniachine (ANL) vaniachine@anl.gov. Data Challenges in ATLAS Computing. Outline & Acknowledgements. World Wide computing model Data persistency Application framework Data Challenges: Physics + Grid

zoltan
Download Presentation

ATLAS Collaboration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS Collaboration • Invited talk at ACAT’2002, Moscow, Russia June 25, 2002 • Alexandre Vaniachine (ANL) • vaniachine@anl.gov Data Challengesin ATLAS Computing

  2. Outline & Acknowledgements • World Wide computing model • Data persistency • Application framework • Data Challenges: Physics + Grid • Grid integration in Data Challenges • Data QA and Grid validation • Thanks to all ATLAS collaborators whose contributions I used in my talk

  3. Core Domains in ATLAS Computing • ATLAS Computing is right in the middle of first period of Data Challenges • Data Challenge (DC) for software is analogous to Test Beam for detector: many components have to be brought together to work • Separation of the data and the algorithms in ATLAS software architecture determines our core domains: • Persistency solutions for event data storage • Software framework for data processing algorithms • Grid computing for the data processing flow

  4. World Wide Computing Model The focus of my presentation is on the integration of these three core software domains in ATLAS Data Challenges towards a highly functional software suite, plus a World Wide computing model which gives all ATLAS equal and equal quality of access to ATLAS data

  5. ATLAS Computing Challenge • The emerging World Wide computing model is an answer to the LHC computing challenge: • For ATLAS the raw data itself constitute1.3 PB/year adding “reconstructed” events and Monte Carlo data results in a~10 PB/year(~3 PB on disk) • The required CPU estimates including analysis are ~1.6M SpecInt95 • CERN alone can handle only a fraction of these resources • Computing infrastructure, which was centralized in the past, now will be distributed (in contrast to the reverse trend for the experiments that were more distributed in the past) • Validation of the new Grid computing paradigm in the period before the LHC requires Data Challenges of increasing scope and complexity • These Data Challenges will use as much as possible the Grid middleware being developed in Grid projects around the world

  6. Technology Independence • Ensuring that the ‘application’ software is independent of underlying persistency technology is one of the defining characteristics of the ATLAS software architecture (“transient/persistent” split) • Integrated operation of framework & database domains demonstrated the capability of • switching between persistency technologies • reading the same data from different frameworks • Implementation: data description (persistent dictionary) is stored together with the data, application framework uses transient data dictionary for transient/persistent conversion • Grid integration problem is very similar to the transient/persistent issue, since all objects become just the bytestream either on disk or on the net

  7. ATLAS Database Architecture Independent of underlying persistency technology Data description stored together with the data Ready for Grid integration

  8. Change of Persistency Baseline • For some time ATLAS has had both a ‘baseline’ technology (Objectivity) and a baseline evaluation strategy • We implemented persistency in Objectivity for DC0 • A ROOT-based conversion service (AthenaROOT) provides the persistence technology for Data Challenge 1 • Technology strategy is to adopt LHC-wide LHC Computing Grid (LCG) common persistence infrastructure (hybrid relational and ROOT-based streaming layer) as soon as this is feasible • ATLAS is committed to ‘common solutions’ and look forward to LCG being the vehicle for providing these in an effective way • Changing the persistency mechanism (e.g. Objectivity -> Root I/O) requires a change of “converter”, but of nothing else • The ‘ease’ of the baseline change demonstrates benefits of decoupling transient/persistent represenations • Our architecture, in principle, is capable to provide language independence (in the long-term)

  9. Athena Software Framework • ATLAS Computing is steadily progressing towards a highly • functional software suite and implementing World Wide model • (Note that a legacy software suite was produced and still exists and is used:so it can be done for ATLAS detector!) • Athena Software Framework is used in Data Challenges for: • generator events production • fast simulation • data conversion • production QA • reconstruction (off-line and High Level Trigger) • Work in progress: integrating detector simulations • Future Directions: Grid integration

  10. Athena Architecture Features • Separation of data and algorithms • Memory management • Transient/Persistent separation Athena has a common code base with GAUDI framework (LHCb)

  11. ATLAS Detector Simulations • Scale of the problem: • 25,5 millions distinct volume copies • 23 thousands different volume objects • 4,673 different volume types • managing up to few hundred pile-up events • one million hits per event on average

  12. Universal Simulation Box DetDescription MC event (HepMC) Detector simulation program MC event (HepMC) Hits MC event (HepMC) Digitisation MCTruth With all interfaces clearly defined, simulations become “Geant-neutral” You can in principle run G3, G4, Fluka, parameterized simulation with no effect on the end users G4 robustness test completed in DC0

  13. Data Challenges • Data Challenges prompted increasing integration of grid components in ATLAS software • DC0 used to test the software readiness and the production pipeline continuity/robustness • Scale was limited to < 1 M events • Physics oriented: output for leptonic channels analysis and legacy Physics TDR data • Despite the centralized production in DC0 we started deployment of our DC infrastructure (organized in 13 work packages) covering in particular areas related to Grid like: • production tools • Grid tools for metadata bookkeeping and replica management • We started distributed production on the Grid in DC1

  14. DC0 Data Flow • Multiple production pipelines • Independent data transformation steps • Quality Assurance procedures

  15. Data Challenge 1 • Reconstruction & analysis on a large scale: exercise data model, study ROOT I/O performance, identify bottlenecks, exercise distributed analysis, … • Produce data for High Level Trigger (HLT) TDR & Physics groups • Study performance of Athena and algorithms for use in High Level Trigger • Test of ‘data-flow’ through HLT: byte-stream -> HLT-> algorithms -> recorded data • High statistics needed (background rejection study) • Scale ~10M simulated events in 10-20 days, O(1000) PC’s • Exercising LHC Computing model: involvement of CERN & outside-CERN sites • Deployment of ATLAS Grid infrastructure:outside sites essential for this event scale • Phase 1 (started in June) • ~10oM Generator particles events (all data produced at CERN) • ~10M simulated detector response events (June – July) • ~10M reconstructed objects events • Phase 2 (September –December) • Introduction and use of new Event Data Model and Detector Description • More Countries/Sites/Processors • Distributed Reconstruction • Additional samples including pile-up • Distributed analyses • Further tests of GEANT4

  16. DC1 Phase 1 Resources • Organization & infrastructure is in place lead by CERN ATLAS group • 2000 processors, 1.5.1011 SI95sec • adequate for ~ 4*107 simulated events • 2/3 of data produced outside of CERN • production on a global scale: Asia, Australia, Europe and North America • 17 countries, 26 production sites Australia Melbourne Canada Alberta Triumf Czech Republic Prague Denmark Copenhagen France CCIN2P3 Lyon • Germany • Karlsruhe • Italy: INFN • CNAF • Milan • Roma1 • Naples • Japan • Tokyo • Norway • Oslo Portugal FCUL Lisboa Russia: RIVK BAK JINR Dubna ITEP Moscow SINP MSU Moscow IHEP Protvino Spain IFIC Valencia Sweden Stockholm Switzerland CERN Taiwan Academia Sinica UK RAL Lancaster Liverpool (MAP) USA BNL . . .

  17. Data Challenge 2 Schedule: Spring-Autumn 2003 Major physics goals: • Physics samples have ‘hidden’ new physics • Geant4 will play a major role • Testing calibration and alignment procedures • Scope increased to what has been achieved in DC0 & DC1 • Scale at a sample of 108 events • System at a complexity~50%of 2006-2007 system • Distributed production, simulation, reconstruction and analysis: • Use of GRID testbeds which will be built in the context of the Phase 1 of the LHC Computing Grid Project, • Automatic ‘splitting’, ‘gathering’ of long jobs, best available sites for each job • Monitoring on a ‘gridified’ logging and bookkeeping system, interface to a full ‘replica catalog’ system, transparent access to the data for different MSS system • Grid certificates

  18. Grid Integration in Data Challenges • Grid and Data Challenge Communities - • overlapping objectives: • Grid middleware • testbed deployment, packaging, basic sequential services, user portals • Data management • replicas, reliable file transfers, catalogs • Resource management • job submission, scheduling, fault tolerance • Quality Assurance • data reproducibility, application and data signatures, Grid QA

  19. Grid Middleware ?

  20. Grid Middleware !

  21. ATLAS Grid Testbeds US-ATLAS Grid Testbed NorduGrid EUDataGrid For more information see presentations by Roger Jones and Aleksandr Konstantinov

  22. Interfacing Athena to the GRID • Making the Athena framework working in the GRID environment requires: • Architectural design & components making use of the Grid services GANGA/Grappa GUI GRID Services Histograms Monitoring Results Virtual Data Algorithms Athena/GAUDI Application • Areas of work: • Data access (persistency), Event Selection, • GANGA (job configuration & monitoring, resource estimation & booking, job scheduling, etc.), • Grappa - Grid User Interface for Athena

  23. Data Management Architecture AMI ATLAS Metatdata Interface MAGDA MAnager for Grid-based DAta VDC Virtual Data Catalog

  24. AMI Architecture Data warehousing principle (star architecture)

  25. MAGDA Architecture Component-based architecture emphasizing fault-tolerance

  26. VDC Architecture • Two-layer architecture

  27. Introducing Virtual Data • Recipes for producing the data (jobOptions, kumacs) has to be fully tested, the produced data has to be validated through a QA step • Preparation production recipes takes time and efforts, encapsulating considerable knowledge inside. In DC0 more time has been spent to assemble the proper recipes than to run the production jobs • When you got the proper recipes, producing the data is straightforward • After the data have been produced, what do we have to do with the developed recipes? Do we really need to save them? • Data are primary, recipes are secondary

  28. Virtual Data Perspective • GriPhyN project (www.griphyn.org) provides a different perspective: • recipes are as valuable as the data • production recipes are the Virtual Data • If you have the recipes you do not need the data (you can reproduce them) • recipes are primary, data are secondary • Do not throw away the recipes, • save them (in VDC) • From the OO perspective: • Methods (recipes) are encapsulated together with the data in Virtual Data Objects

  29. VDC-based Production System High-throughput features: • scatter-gather data processing architecture Fault tolerance features: • independent agents • pull-model for agent tasks assignment (vs push) • local caching of output and input data (except Objy input) • ATLAS DC0 and Dc1 parameter settings for simulations are recorded in the Virtual Data Catalog database using normalized components: parameter collections structured “orthogonally” • Data reproducibility • Application complexity • Grid location • Automatic “garbage collection” by the job scheduler: • Agents pull the next derivation from VDC • After the data has been materialized agents register “success” in VDC • When previous invocation has not been completed within the specified timeout period, it is invoked again

  30. Tree-like Data Flow Exercising rich possibilities for data processing comprised of multiple independent data transformation steps Atlfast.root Athena Atlfast recon.root Atlfast recon filtering.ntuple HepMC.root Athena conversion digis.root Athena recon recon.root atlsim digis.zebra Athena Generators Athena conversion Athena QA geometry.zebra Athena QA geometry.root QA.ntuple QA.ntuple

  31. Data Reproducibility • The goal is to validate DC samples productions by insuring the reproducibility of simulations run at different sites • We need the tool capable to establish the similarity or the identity of two samples produced in different conditions, e.g at different sites • A very important (and sometimes overlooked) component for the Grid computing deployment • It is complementary to the software and/or data digital signatures approaches that are still in the R&D phase

  32. Grid Production Validation Simulations are run in different conditions; for instance, same generation input but different production sites For each sample, Reconstruction, i.e Atrecon is run to produce standard CBNT ntuples The validation application launches specialized independent analyses for ATLAS subsystems For each sample standard histograms are produced

  33. Comparison Procedure Superimposed Samples Test sample Reference sample Contributions to 2

  34. Summary of Comparison Comparison procedure endswith a 2 -bar chart summary Give a pretty nice overview of how samples compare:

  35. Example of Finding Comparing energy in calorimeters for Z  2l samples DC0, DC1 Difference caused by the  cut at generation It works!

  36. Summary • ATLAS computing is in the middle of first period of Data Challenges of increasing scope and complexity and is steadily progressing towards a highly functional software suite, plus a World Wide computing model which gives all ATLAS equal and equal quality of access to ATLAS data • These Data Challenges are executed at the prototype tier centers and use as much as possible the Grid middleware being developed in Grid projects around the world • In close collaboration between the Grid and Data Challenge communities ATLAS is testing large-scale testbed prototypes, deploying prototype components to integrate and test Grid software in a production environment, and running Data Challenge 1 production in 26 prototype tier centers in 17 countries on four continents • Quite promising start for ATLAS Data Challenges!

More Related