ATLAS Analysis Overview

ATLAS Analysis Overview Eric Torrence University of Oregon/CERN 10 February 2010 Atlas Offline Software Tutorial

Outline • This talk is intended to give a broad outline for howanalyses within Atlas should be done for first data. • Technical details will be covered by other speakers • ATLAS data types and locations • Derived samples (dAOD/dESD) • Data sample selection • Luminosity

Analysis Model for the First Year Final report October 2009 AMFY Task Force J. Boyd, A. Gibson, S. Hassani, R. Hawkings, B. Heinemann, S. Paganis, J. Stelzer, T. Wengler See talk by Thorsten Wengler at Barcelona Atlas Week http://indico.cern.ch/materialDisplay.py?contribId=73&sessionId=11&materialId=slides&confId=47256 Or the final AMFY Report http://cdsweb.cern.ch/record/1223952

Analysis Data Formats ESD RAW AOD AOD • RAW - specialized detector studies (e.g.: alignment) • ESD - detector performance studies • AOD - starting point for physics analyses (~1/10 of ESD) All events processed from RAW to AOD Processing separated by trigger-based data STREAM • Egamma • jet - Tau - MET • Muon • Minbias • Bphysics Physics Streams Streams are inclusive, e.g. : All electron triggers written to e/gamma stream

Derived Samples ESD RAW AOD AOD Central Production Group Production Small Group/User AODFix Specialised RAW samples dESD dAOD AOD • Derived samples (dESD, dAOD) aim to reduce sample size furtherby removing events (skimming) and/or removing info (slimming/thinning) • Many versions/variants possible for specific uses • Skimming based on offline quantities and/or trigger dAOD (Physics DPD) - e.g.: diMuon (2 mu, pT > 15 GeV) dESD (Perf. DPD) - e.g.: eGamma (CaloCells near e/gamma triggers) Currently popular: dESD_COLLCAND

User Data ESD, AOD, dESD - Stored onGrid in ATLAS space production managed centrally ESD RAW AOD AOD AODFix dESD dAOD AOD dAOD, ntuples - Stored ingroup disk or local user spacemanaged by groups/users Each group has a grid space manager and a production manager. Good to find out who this person is... ntuple ntuple PerformanceAnalysis PhysicsAnalysis

Tier-2 desktop Tier-3 Databases and Metadata Analysis model assumes any Athena job (ESD/dESD, AOD, dAOD) needs access to conditions data (e.g.: COOL) Oracle server at nearest Tier-1 • Detector geometry, conditions, alignments • Data quality and trigger configuration metadata • InSituPerformance information • Dataset information (AMI) • TAG database (TAG) laptop Even a purely ntuple-based analysis may need some of this.Potential bottleneck, hassle for end usersRunning on MC typically easier than data Be aware of external files referenced by DB...

Selecting your data • Time range/machine conditions • Collision Energy: 7 TeV • All 2010 data approved for Summer conferences • Detector/offline data quality (DQ flags) • Data periods with good electrons in barrel • Data with pixels turned on • Trigger Configuration • EF_e20_loose active • Both e20 and mu10 active and unprescaled All analyses must define their data sample Key ingredient in defining luminosity

Luminosity Blocks • ATLAS runs are subdivided into Luminosity Blocks (~1 min) • LB is the atomic unit for selecting a data sample • Most conditions are mapped to specific luminosity blocks • Trigger pre-scales (can only change at LB boundary) • Data Quality flags (mapped to LBs offline) • Luminosity can only be determined for a specific set ofruns/luminosity blocks Run 165789 Lumi Block 1 2 3 4 5 6 7 8 9 10 Specifying your data sample functionally meansspecifying a list of runs/LBs to analyze. aka: GoodRunList

Data Quality Flags • DQ Flags are simple indicators of data quality, but many many sources (I counted 102...) • Detectors (divided by barrel, 2 endcaps) • Trigger (by slice) • Offline combined perf. (e/mu/tau/jet/MET/...) • Physics analyses should not arbitrarily choose DQ flags • Combined performance groups will define a recommended set of DQ flag criteria for physics objects (e.g.: Barrel electrons or forward jets) - soon with virtual flags • Users in their working group will decide which set of (virtual) DQ flags is needed for each analysis, whichcan then be used to generate a GoodRunList • Standard lists will be centrally produced by DQ group

Reprocessing • Data processed at Tier 0 (right off the detector) have preliminary calibrations and first-pass DQ flags • Any physics results must start with reprocessed data • Updated calibrations • Consistent release/bug fixes • DQ flags are re-evaluated, tagged and locked after each reprocessing - flags fixed, but must use correct tag! • Important to always access DQ information from the COOL database, using the tag appropriate to your data processing version, or else you won’t get consistent results Best to use officially generated (static) Good Run Lists appropriate for your reprocessed data

Luminosity Calculation σ = (Nsel - Nbgd)/(εselLumi) • In ATLAS, Lumi is calculated for a specific set of LBs and includes LB-dependent corrections for • Trigger prescales • L1 Trigger Deadtime • The user derived εselmust contain all other event-dependent efficiencies, including • Unprescaled trigger efficiency (vs. pT for example) • Skimming efficiency (in dAOD production) • TAG selection efficiency (in TAG-based analyses) • Event selection cuts Final luminosity can only be calculated after full GRL and trigger is specified

Example Physics Analysis Models • Direct AOD/dAOD • User submitted Grid job run over all AODs or group-generated dAOD/dESD samples • Final user ntuples produced directly • TAG-based analysis • Start with TAG-selected sample (see TAG tutorial) • Saves having to run over large datasets • Ntuple-based analysis • Start with general purpose group ntuple where all data selection criteria haven’t been applied • Probably most complicated, but tools support this also Will concentrate on this, other cases are not so different

AOD/dAOD analysis Pre-run query LumiCalc • Pre-run query based on input parameters already defines GRL (instantiated as an XML file) • atlas-runquery (or TAG browser) currently best way to do this - eventually most users will start with pre-made lists • Can find ‘expected’ Luminosity already without runninga single job • Needs COOL access, but can just be run once, XML GRL files can be transferred to your local area/laptop ExpectedLuminosity COOL Time Range DQ query Trigger, ... GRL (XML File) RunQuery COOL Data sampleDefinition

AOD/dAOD analysis Distributed Analysis • No automatic tool (yet) to convert GRL to AOD DataSet(although TAG does this for you) • Using standard tools, you will get an output GRL - very useful to compare against input for cross-checks • Copy of GRL will also be saved to your output ntuple if made with the PAT Ntuple Dumper* The Grid Input GRL (XML File) job out Ntuple job out GRL job out Ganga/ pAthena AMI Output Merge job out job out Output GRL (XML File) job out DataSet (AOD files)

AOD/dAOD analysis Bookkeeping • Comparing input and output XML files very useful to check for job failures and dataset consistency • Output shows exactly which LBs were analyzed • run LumiCalc directly on ntuple (with COOL access) • extract XML file, copy to CERN, and run LumiCalc there LumiCalc Input GRL (XML File) Ntuple Ntuple GRL (XML File) ObservedLuminosity COOL GRL Compare! LumiCalc Ntuple ObservedLuminosity COOL GRL Output GRL (XML File)

Sparse Data Problem • Even if LB selects no events, LB still contributes to Luminosity • GRL selection tool outputs LB metadata (also in ntuple) even when no events selected • Derived (dAOD) samples must also obey this requirement • Must merge all job output to correctly include all LBs consideredworking to include this in standard distributed analysis merger Input Grid Job Output LB: 1-3 job 1 1 ev Total LBs analyzed:1-5 LB: 4-5 job 2 0 ev LB: 6-8 job 3 crashed

Other use cases • Easiest to make entire selection (complete GRL specification) before launching AOD/dAOD job • For many reasons, people may start with samples withpartial (or no) GRL specifications applied • skimmed group dAODs • group ntuples • TAG-selected data* • Scheme still works, as long as samples have been produced with the standard tools - metadata about all LBs contributing to the sample still present and consistent • Users must still apply final GRL specification (including trigger) to create sample with well-defined luminosity • Tools also work at the ntuple level, must save LB number

Conclusions • Analysis model for the first year has undergone considerable discussion and development recentlyThere may still be some rough edges • General scheme for simple cases now exist • Work ongoing to support more advanced use cases • Tools available to make this easier to the end-user- many more technical details shown today • There are many ways to screw this up, use central tools and central GRLs as much as possible • If tools don’t seem do what you want, let a developer know!

ATLAS Analysis Overview