1 / 28

III Workshop Italiano sulla fisica di ATLAS e CMS Bari, 21 Ottobre 2005

Stato del Software e del Modello di Analisi piani per il commissioning e la presa dati. III Workshop Italiano sulla fisica di ATLAS e CMS Bari, 21 Ottobre 2005. Outline ATLAS and CMS : Frameworks : Athena and CMS Framework Event Data Model Analysis Model

Download Presentation

III Workshop Italiano sulla fisica di ATLAS e CMS Bari, 21 Ottobre 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stato del Software e del Modello di Analisi piani per il commissioning e la presa dati III Workshop Italiano sulla fisica di ATLAS e CMS Bari, 21 Ottobre 2005 F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  2. Outline ATLAS and CMS : • Frameworks : Athena and CMS Framework • Event Data Model • Analysis Model • Software Validation and Commissioning plans • ATLAS “Rome Production” experience • Thoughts for discussion F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  3. Frameworks: Athena and CMS Framework • ATLAS and CMS have adopted an object-oriented approach based on C++ to develop a framework for common processing of : Event generation, simulation, digitization, reconstruction, analysis, High Level Trigger. • Event generation will be included in the CMS actual design by September 2006 and will be used as framework validation for GENSER (official LCG packages for Generators) • Common and basic principles are: • abstract interfaces that allow: • different implementation providing similar functionality optimized for particular environment (i.e. HLT and offline) • clear separation between data and algorithms : • less dependencies between the complex algorithms used to build i.e. a track and the client view of it (i.e. Particle identification algorithms) • What is different : • ATLAS : separation between persistent (in file) and transient (in memory) data algorithmic code should be independent of the technology used to store it more flexible • CMS: use the same format for both  better performances F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  4. Athena Framework : •  enhanced version of Gaudi originally developed by LHCb • Application Manager : coordinates the activity of all components within the application. • Algorithms: performs a well-defined operation on some input data producing in many cases • output data (i.e. : a reco Algorithm that processes input Calo Cells to calculate output Calo Clusters) • Services: to provide services needed by the Algorithms (i.e.: message-reporting system) • Transient Data Store (TDS)  StoreGate: • An Algorithm creates data objects (i.e. Calo Clusters) and post them to the TDS to allow • other Algorithms to access them • (i.e. jet reco Algorithms). • The TDS manages the data objects • accessed by Algorithms organizing them • in various transient data stores • depending on their lifetimes : • event info has shorter life • respect to detector info, such • as geometry, that is stable • across many physics events. • Converters: • are responsible for converting • data from one representation • to another, for example the • transformation of an object • from its transient to its • persistent form and vice versa. Converter Converter Application Manager Converter Transient Event Store Data Files Message Service Persistency Service Event Data Service JobOptions Service Algorithm Algorithm Algorithm Data Files Transient Detector Store Particle Prop. Service Persistency Service Detec. Data Service Other Services Data Files Transient Histogram Store Persistency Service Histogram Service F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  5. CMS Framework : • During the Data Challenge 2004 (DC04) some problems have been discovered in the CMS software design : • the old structure does not permit to perform analysis simply • using ROOT or ROOT with some external libraries • difficult to perform interactive analysis • no predefined scheduling: only on demand reconstruction • In December 04 CMS has taken the decision to reengineer its software. • The main goal of this decision is to provide a reconstruction software with high modularity, predefined scheduling and allowing the direct use of ROOT in the framework. • The new Framework structure is completely changed: only the algorithmic code can be ported to the new Framework from the old one. • CMS has now two lines of software development: • Old: The old software is still used in order to provide results • for the Physics TDR • New: Development of new software F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  6. Raw det. output Reco products Simul products Analysis object Provenence EventSetUp Record Record Record Application Manager ParameterSet Event Module (Producer) Module (Producer) Module (Analyzer) Algorithm Module (Analyzer) Module (Producer) Algorithm Algorithm Algorithm Algorithm • CMS Framework : • Application Manager : manages • and schedules the activity of all • components within the application. • Event : observed and inferred products of a • single interaction in CMS detector • EventSetUp :provides for the full process • of the Event a uniform access model • for all services that deliver non-event data • (Geometry etc.) • Record : container of non-event data having the • same Interval Of Validity (IOV) • Module : it provides a unit of event-processing • functionality; it is the only “worker” entity in the FW • and it must not interact with other modules. • Two types of Modules: Producer = read and write the Event • Analyzer = read only the Event • Algorithm : it is called by a single module and itperforms a well-defined operation on input data. • Provenance : each data included in the Event has an associated Provenance that contains • the information on how and by which Module it was produced. F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  7. Typical Athena / CMS Framework job : • Algorithms, tools, etc are written in C++ and configured in Python (ATLAS)ortxt (CMS) • Any job is defined by some Python JobOptions or ParameterSet file which says: • Load these libraries • Read this data • Run these algorithms in this order • Configure each algorithm in this way • Output this data Software Organization (ATLAS and CMS) : • A hierarchy of release builds is used to ensure rapid feedback of package integration problems and as a testbed for testing and validation purpose (nightly, developer, production, bug-fix releases) Code distribution (ATLAS and CMS): • For each Production and bug-fix release a distribution kit is created that allows users to install the software locally in their own institutes F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  8. Event Data Model:Different types of data corresponding to different stages of Reconstruction : RAW ATLAS: ~1.6 MB/event CMS: ~1.5 MB/event Triggered events recorded by DAQ Reconstructed info : ATLAS: ~1.2 MB/event target size = ~ 500 kB/event Contains the detailed output of the detector reconstruction, includes : track candidates, hits, cells intended for calibration. POOL format = combines ROOT I/O with MySQL relational DB ESD/RECO CMS: ~ 250 kB/event Analysis Object Data : analysis info AOD ATLAS: ~100 kB / event CMS: ~ 50 kB/event Contains a summary of the reconstructed event for common analyses: jets, (best) id of particles. POOL format TAG ATLAS: ~1-10 kB/event CMS: ~1-10 kB/event Fast selection info Relevant information for fast event selection in AOD and/or ESD files F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  9. Calibrated Hit solid line = full copy dashed line = copy only used info Tracks Vertex Physics Object RECO AOD Calibrated Hit Tracks CaloCell CaloCluster tauObject TauJet Vertex Track/ID TrackParticle/ID TrackParticle/ID Vertex/Primary Physics Object ESD AOD Size and Content of ESD/RECO and AOD : • Is currently evolving due to increasing knowledge of what is actually needed for analysis. • CMS example : RECO/AOD info for objects that are built from tracks • Back navigation AODESD : process that searches in ESD or RAW data for objects that are not found in AOD during analysis. • ATLAS example : from a TauJet object in AOD is possible to navigate back to its constituents clusters, cells and tracks in the corresponding ESD file solid line = direct navigation dashed line = duplication of objects F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  10. Analysis Model :  Not yet defined in both experiments ATLAS and CMS both experiments are starting from concrete experiences in order to achieve the goal of a usable and user-friendly Analysis Model ATLAS : starting from the “Rome production” experience, to identify what is still missing to build an analysis model  all details during the discussion CMS : starting from DC04 and Physics TDR experience, in order to understand the possible use cases and which are the main requirements for the Analysis Model F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  11. CMS Basic Types and EDM requirements • AOD: it contains also the 4-vector information of the physics objects in order to use it in the final analysis.  The main idea is the capability to read the data directly using ROOT with or without loading shared libraries • Particles Candidates: they can be build in principle on top of either RECO or AOD; • For most of the cases it’s reasonable to use candidates built on top of AOD but until the AOD structure is not well defined, it’s reasonable to build the candidates on top of RECO. • The Candidate collections need to be persistent, in order to avoid re-doing the combinatorial analysis that is frequently CPU expensive. • The structure of Candidates has yet to be finalized. Actual components are the 4-momentum, the vertices, the error matrices required to perform the geometric-kinematic fits. • User Data: the users should have the possibility to add their own data to the Event, either persistent quantities that require large processing time or ntuple-like formats for interactive access. F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  12. Prototype of CMS Physics Analysis tool • Basic requirements: • Reusability of specific software to other purpose • The learning curve for newcomers should be as short as possible • Analysis Product as Software Deliverables • A deliverable should consist of two components: • Framework modules (Producers) to produce the specific products • Event Data Products • From experience of past experiments this is a successful practice: • Physics tools are part of official CMS Framework release • It will be more efficient to perform new analyses since the basic “building blocks” will be available • To ensure all this the Physics Software Tools must be well documented with examples, references available and organized in web pages. Module that provide the Z0 reconstruction The Z0 object that will be written in the Event F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  13. InteractiveTool The IGUANACMS tool is very useful in order to provide an interactive way of doing the Analysis, the Inspection and the Debugging. • Visualization for simulation, reconstruction and test-beams (DAQ application); • Visualization of reconstructed and simulated objects: tracks, hits, digits, vertices, etc.; • Event browser; • Web client under development, it will be possible to use IGUANACMS without installing it but simply with Web Browser; F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  14. Distributed Analysis: CRAB (CMS Remote Analysis Builder)Enables CMS users to easily perform analysis jobs on every published Dataset Tool for analysis job preparation, splitting, submission, monitoring and output retrieval. Make more user-friendly the GRID use The end-user inputs: DataSets (runs, #events and conditions…) + private executable. This tool has already been widely used. CMS users have submitted ~160.000 jobsfor Physics-TDR F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  15. ATLAS Software Validation … and Commissioning : • Data Challenge (DC-3), that was scheduled for the end of 2005, has been replaced by the CSC (Computing System Commissioning) and consists in a series of activities designed to validate all aspects of the computing and software prior to ATLAS turn-on in 2007 • ATLAS Validation in next future, three step process : • Nightly validation by RTT (Run Time Test) : package specific tests done automatically, results are accessible via web page, currently 197 different tests on 18 different packages • 10^5 sample will be run on GRID for every major (usable) release (Oct 05) • 100k events from 10-15 physics samples, for example: Min Bias, Zee, Z , Z, H  (120 GeV), W  , b-tagging samples, top, QCD di-jets samples in different pT bins , single particles for calibration purposes F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  16. ATLAS Software Validation … and Commissioning : • 10^6 sample will be run on GRID for all “production releases”, to be completed in one week • 1M events, ~25 physics samples, quite all the samples above and more • validation of full software chain from generation to reconstruction before passing to real production • Real production : • 10^7 events (DC2/Rome prod scale), typical scale of distributed production samples, to be completed in 6-8 weeks: • For example, 10M events from physics groups: at least 100k per sample, 500k events for each sample used for validation, plus additional physics samples • full software chain (event generation, simulation, digitization, pileup, reconstruction, tag/merging, analysis) • a detailed plan still to be defined , should start before Xmas F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  17. CMS Software Validation CMS is setting up a procedure for release validation: • starting from the past experience of validation in the old framework: • For each release ~10 samples of 10K events • The subsystems responsible for a certain module have to provide its validation • A new tool has been developed for the simulation validation (up to digitization), it is capable to perform accurate tests for each sub detector: • These tests are based on a comparison of histograms where quantity like number of tracks, number of particles, etc. etc. are stored • Also some verifications from the Physics point of view will be needed. • It is planned also to use the Data Quality Monitoring tool for the physics validation F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  18. CMS Computing & Software commissioning Magnet Test Cosmic Challenge (MTCC) In April 2006 the structure, assembled for the Magnet Test, is supposed to be ready for the data taking. All the CMS sub detectors, except the pixel, take part to this test. - First test of integration between on-line and off-line software. Computing, Software and Analysis Challenge (CSA-2006) • Data challenge “CSA 2006” should be considered a Software & Computing Commissioning. It consists in a continuous operation rather than a stand-alone challenge. • Main aim of Software & Computing Commissioning will be to test the software and computing infrastructure in view of 2007; i.e.: • Calibration and alignment procedures and conditions DB • Full trigger chain • Tier-0 reconstruction and data distribution • Distributed access to the data for analysis • At the end (autumn 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates. F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  19. ATLAS “Rome Production” experience Thoughts for discussion F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  20. ATLAS “Rome Production” experience: • During “Rome Production” finalized to the Rome Workshop (June `05) ATLAS physicists, who have done the final analyses, have : • experienced completely new tools : • Distributed production (GRID) • Athena Framework • “Event Data Model” (AOD/ESD) • New analysis style/tools • interacted with Software experts, Production System and GRID experts learning a common language • given constructive feedback about the tools and the analysis models helping Software experts to identify what is really needed • The “Rome Workshop” permitted : • to do an overall check of the ATLAS computing deliverables : we know where we are and we are convinced to go in the right direction… • to set also a baseline for future activities ANALYSIS EXPERIENCE F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  21. ANALYSIS EXPERIENCE: The “Rome Production” has been the first testing ground of the Event Data Model and the first step towards an Analysis Model for ATLAS : • 2 kinds of data format from Reconstruction  2 analysis scenarios (1) NEW Analysis on AOD (POOL format ) : • copy of the AOD’s of interesting DataSets • develop the analysis class (C++) into Athena (PhysicsAnalysis package) • compilation and execution of analysis code • Ntuples and/or Histos output production • use ROOT to analyse Ntuples/plot Histos (2)OLD Analysis on CBNT (Combined Ntuple in ROOT format ) : • copy of the CBNT of interesting DataSets • develop and execute the ROOT macro • Ntuples and/or Histos output production • use ROOT to analyse Ntuples / plot Histos • Crucial check of AOD s : contain all quantities needed for analysis as in old and well-known CBNT ! • No time to test also back-navigation AOD ESD : ESD spread on many GRID sites and difficult to find/transfer F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  22. Rome Workshop feedback: Main problems about analysis on AOD and possible solutions : • much effort spent in building the C++ analysis class into Athena Interactive Analysis( under development ) • big overlaps among containers of electrons, taus, jets • MyPreselected containers of electrons, taus, jets available but built with very loose criteria, not a real particle identification and in any case different analyses will have different criteria EventView ( under development ) • AOD processing SLOWER respect to CBNT (Combined Ntuples): • to process 35k evts from AOD : ~ 30 min from CBNT : ~ 5 min Athena-Aware Ntuples (under development ) • the transfer of AOD/CBNT was one of the major problem: • problems with GRID Data Management Tools and with some SE’s in GRID sites Distributed Analysis(under development ) F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  23. Interactive Analysis in Athena : powerful tool to develop, debug and optimize analysis cuts, calibrations, running on a small data sample many times, also useful to study in all details some problematic/interesting events • based on Python scripting language • Some tools are under development to make interactive analysis,i.e. : • Pylcgdict : enables access to Event Data Model objects. • PyROOT : provides all ROOT functionality for analysis and visualization. • PyPoolBrowser : permits to examine interactively the AOD contents > athena -i Interactive_topO.py Initialize athena> theApp.initialize() application athena> theApp.nextEvent() retrieve electrons from AOD athena> econ = PyParticleTools.getElectrons ("ElectronCollection") athena> e = econ[0] get first electron athena> e.pt() see electron pT see pT distribution of electrons athena> plot("ElectronContainer#ElectronCollection","$x.pt()",nEvent=5) athena> include ("PyAnalysisExamples/PyPoolBrowser.py") F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  24. Event View Builder Algorithm Highly configurable by users : • extracts input objects from AOD overlapped containers, • apply analysis preselection criteria removing overlaps, • possibility to add variables specific to a particular "view ” (UserData), • creates new final state objects storing the output in an ntuple EventView (EV) : it gives a "view” of the physics event by a particular analysis  it is a collection of reconstructed objects that are coherent, mutually exclusive creates extracts AOD Event View Electron Container Electron Container e e Tau Container Tau Container e   Jet Container Jet Container e  Jet Jet Jet Jet Transient Event Store (StoreGate) F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  25. AANTuple TBranch TTree TBranch Athena Token POOL Athena-Aware Ntuples : • ROOT file with a simple Tree : • for fast analysis with ROOT in standalone mode • support laptop analysis scenario • Keep link with Athena Framework : • for full Athena-based analyses • allows back-navigation AANtuple AOD ESD AOD, ESD F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  26. User Interface ATLAS Production System (1) (2) (2) (3) Site X (2) Computing Element Computing Element Computing Element Storage Element Storage Element Storage Element Site Y (3) (3) (3) Distributed Analysis : one possible scenario “data-driven” scenario: analysis jobs are sent on sites where data are stored the ATLAS Production System, which has been used to run Rome production jobs on 3 GRIDS, can be one possible tool also to perform analysis using GRID resources. Site Z (1) Analysis job defined by the user is split in "n" identical jobs and (2) sent to the "n" GRID sites where input data are stored. (3) output files are merged and final output sent to user • Some functionalities have been recently • implemented in the Prod System to support • distributed analysis : • Shipping of customized analysis algorithms • (private code to be compiled remotely) • Submission of jobs to sites were the input • data are already available Some tests already performed on real analysis, much work still to be done …. F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  27. Thoughts for discussion (1) : ATLAS and CMS have a common plan to build the future Analysis Model provide a set of tools to users that permit to : • do fast analysis (like in the old ntuple-way) • keeping all the advantages provided by a Framework : use of general Tools, Back Navigation facility, scripting, quality check of the code • and accessing, if needed, distributed resources We have the main ingredients ( AOD/ESD + Framework + C++/Python + GRID) plus some tools under development BUT : • the system, at the moment, is not enough user-friendly for physicists that have to deal with many technical aspects, is this too heavy ? • … and moreover, at which level technical details should be hidden to the physicists ? They don`t want to deal with a black-box but with a system in which all steps could be easily monitored : “if my analysis job crashes I want to know quickly if it`s due to the ATLAS software, to the GRID, to some missing input files…. to contact (if needed) the right expert and solve the problem as soon as possible” F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

  28. Thoughts for discussion (2): Before the ATLAS and CMS turn on (2007), both experiments have planned to do serious and massive productions/Data Challenges to do the commissioning of their Software chain : One important lesson learned from past production  need validation of software before and during large scale production: • ATLAS has organized a “Physics Validation Working Group” in which Software experts and Physicists work together to validate the main software releases (from generation to reconstruction) on many physics samples. In these days, the 10^5 sample production on GRID is started to validate next “stable” release for the 10^6 sample production: 100k events from 10-15 samples, for example: Min Bias, Zee, Z , Z, H  (120 GeV), W  , b-tagging samples, top, QCD di-jets samples in different pT bins , single particles for calibration purposes • CMS has not yet finalized how this task should be done. In the past the software responsibles provided it, but often the software knowledge is not enough to found a bug. It’s necessary to improve this strategy with a well defined procedure and with more systematic and accurate tests. A “CMS Validation Simulation” group has been organized few months ago. F. Ambroglini, S. Resconi Bari - 21 Ottobre 05

More Related