1 / 19

ATLAS Data Preservation and Access

ATLAS Data Preservation and Access. Roger Jones. Data Preservation & Access. Opening data access Preparatory discussions with “management”, CB chair, authorship and Pubcom chairs Has clear implications for authorship/membership rules Needs CB-level discussion

shaina
Download Presentation

ATLAS Data Preservation and Access

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS Data Preservation and Access Roger Jones

  2. Data Preservation & Access • Opening data access • Preparatory discussions with “management”, CB chair, authorship and Pubcom chairs • Has clear implications for authorship/membership rules • Needs CB-level discussion • Past experience says these topics provoke long discussion in the CB! • Common principles proposed by LHC experiment Data Policy Harmonization Group straw man • This has been reviewed by the SIPB and taken to CERN Council to become a “policy suggestion” • A draft policy is with the management for discussion & has been seen by the ICB

  3. ATLAS DMP Organization • Data Preservation now included as part of the upgrade activity planning • May increase the funding options – some evidence already • Data Management Planning is now required by some funders for upgrade grants • Looking at the cost/benefit of various strategies • Resource tensioning with other upgrade activities

  4. Principles for preservation & access • General agreement RAW data is preserved for the experiment and future – open data access is not usually possible even to the collaboration members (level 4 data) and is not proposed for general use • Full reconstruction outputs for analysis might be made available after an embargo period –tbd, but clearly embargo of several years. The resource implications to make this useful are high. (Level 3 data) • We support limited access of samples in simple formats for outreach and teaching (level 2 data) – but these are best integrated to our presenter tools • Techniques like Recast may make data (information) usefully available, although it does not meet all the open access criteria for levels 2 & 3 • We already make data from papers and supporting information available through HEPDAT/Inspire (Level 1 data)

  5. Data Preservation Policies • Data Preservation • There are DP policies implied in the Computing TDRs • conserve all raw data during the lifetime of the experiment • All formats & code used for paper analyses to be archived • Tier 0/1s responsible for the physical preservation • Some tacit belief that older sets may be ‘retired’ • Retired data no longer to be on disk or under active analysis • This may need to be revised e.g. if external access is then granted • Obvious resource implications • First priority to to preserve data for active use by the collaboration

  6. ATLAS DP Practical Steps • Making sure raw data can be reprocessed long-term (Level 4) • Identifying key datasets for ‘unique data’ preservation • Setting up regular reprocessing and validation • This has been underway as a test case for the 2009 data, but progress is slow • Forward/backward compatibility issues illustrated in John Chapman’s talk on simulation release plans14/3/13 • Ensure the capability to run old trigger selections offline • AODfixingwill help (reprocessing at analysis format level) • This means level 4 operations can be applied to level 3 AOD format

  7. Digesting validation results • Must display the results of the validation in a comprehensible way: web based interface • The test must determine the nature of the results • Could be simple yes/no, plots, ROOT files, text-files with keywords or length, ... • Need for semi-automated, detailed physics validation • David South is on ATLAS and was central to the DESY SP and DPHEP activities • Identify the useful common components • Identify the ATLAS-specific elements • Set up CERN-based instance for ATLAS (and others?)

  8. Existing open datasets • The CB has authorized various datasets in (level-2) outreach formats for open use in education/outreach • Event displays for interactive analysis (MINERVA/HYPATIA/LPPP/CAMELIA) • JIVE-XML, root format data • Absolutely not intended for any serious analysis, but illustrative

  9. ATLAS Zpath M(eemm)=123 GeV M(gg)=125 GeV • Master the invariant mass technique • to study and measure the (Z, J/y, U) decaying to l+l- • to search for new physics (Z’) • And Higgs boson in gg and l+l-l+l- • HYPATIA using the ATLANTIS event display • Data from 2011 • 13000 events ~2.5 GB (password protected, 100 open) • 13 data groups/directories, 20 subgroups (A-T), and 50 events/mixed sample/2 students • 50% Z, 30% gg, 10% (J/y,U), 5% Z', 5% l+l-l+l- • Higgs candidate events: • 1 fb-1 and cuts according to ATLAS publication • 125 GeV Higgs MC signals ready to upload • (1fb-1, 10fb-1,25fb-1 )

  10. ATLAS Zpath tests • OPloT: • Mll and/or Mgg and/or Mllll to be discussed locally • Moderator: 1 slide with 3 invariant masses; Invariant mass as a tool to identify particles, to discover new particles, and to search for exotic particles • Web pages updated and measurement ready • http://www.physicsmasterclasses.org/exercises/ATLAS-2013/en/zpath.htm • Introduced Higgs • Described new measurements • Prepared material for instructors, moderators, for discussions, …

  11. OPloT Tests 2013 • Higgs comments • 4l provided without requiring 2l from Z, with lower cut on other pair • gg provide MC with 125 Higgs and background • Upload 125 Higgs MC ((1)&10 & 25 fb-1)

  12. ATLAS W-path with real WW (+H) events • Measurements • Wln • W+/W- ratio • Angular distribution between leptons in WW events • MINERVA program using the ATLANTIS event display • 2011 real data: 693 WW/Higgs candidates (from released 1fb-1) mixed with 5307 W and other background events • Histogram tool • spreadsheet and histogram websites connected with database • New measurement tested

  13. ATLAS W-path • Data from 2011, 1.1fb-1 • 350 should be WW (w/o Higgs)
160 should be ttbar or single top
120 should be Z+Jets50 should be W+Jets15 should be from HWW

  14. gg or e+e-? • Left: pT>1GeV; right pT>5 GeV • 2 apparent tracks pointing to 2 calorimeter objects • Zoom reveals 2-pairs e+e- • information • The conclusion is that the 2 calorimeter objects correspond to 2 photons, which have converted and lead to 4 tracks; the tracks from one pair had less than 3 pixel hits • So, to be classified and entered as gg

  15. Level-2 observations • The applications are all trying to illustrate the analyses and physics in the true context of a detector • They use ATLANTIS as a presenter in most cases, which defines the natural common format • Other formats would require an additional interface, to what benefit? • Use case and resource justification for a common format not clear

  16. ATLAS Analysis Roger Jones

  17. The Generic Analysis Flow

  18. Level 3 • ATLAS has no approved level-3 formats for external use, and such release will require such approval • We are concerned that anything released be useful, not consume large amounts of collaboration effort (both in production and response) • As such, tools like Recast are more attractive • The information incorporates the efficiency, acceptances and corrections – so is robust • It also helps meet the internal requirement of full documentation of analyses

  19. Analysis Practical steps - RECAST arXiv:1010.2506 • Framework developed to extend impact of existing analyses • Candidate for within-experiment and long-term analysis archival, encapsulating the full trigger & event selection, data, backgrounds, systematics • Allow an existing analysis to be reinterpreted under an alternate model hypothesis • Complete information from original analysis, including the tacit information, contained in the data • Not optimized for the new model, but more reliable than a naïve reanalysis? Recast seen as a very promising solution for preserving analyses and useful, cost effective preservation of information – addresses levels ~1-~3

More Related