Data challenge needs
1 / 32

Data Challenge Needs - PowerPoint PPT Presentation

  • Uploaded on

Data Challenge Needs. RWL Jones. Data challenges. Goal validate our computing model and our software How? Iterate on a set of DCs of increasing complexity start with data which looks like real data Run the filtering and reconstruction chain Store the output data into our database

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Data Challenge Needs' - ziazan

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Data challenge needs

Data Challenge Needs

RWL Jones

ATLAS UK Physics Meeting

Data challenges
Data challenges

  • Goal

    • validate our computing model and our software

  • How?

    • Iterate on a set of DCs of increasing complexity

      • start with data which looks like real data

      • Run the filtering and reconstruction chain

      • Store the output data into our database

      • Run the analysis

      • Produce physics results

    • To understand our computing model

      • Performances, bottle necks, etc…

    • To check and validate our software

ATLAS UK Physics Meeting


  • Today we don’t have ‘real data’

    • Needs to produce ‘simulated data’ first so:

      • Physics Event generation

      • Simulation

      • Pile-up

      • Detector response

      • Plus reconstruction and analysis

        will be part of the first Data Challenges

ATLAS UK Physics Meeting

Atlas kits

  • Each DC will have an associated kit

  • Current kit 1.3.0 for tests, will be replaced for DC0; testing 2.0.3 in October

  • Default kit excludes compilers, but an `all in’ kit exists – more intrusive in OS but usable by more OS versions

  • So far, no Grid/Globus tools included

ATLAS UK Physics Meeting

ATLAS kit 1.3.0

  • Tar file with ATLAS software to be sent to remote GRID sites and used in DC0

  • Main requirements:

     NO AFS

     NO root privileges to install the software

     Possibility to COMPILE the code

     NOT to big Tar file

     Should run on Linux platform

ATLAS UK Physics Meeting

First version of the ATLAS kit

It installs:

SRT (= Software Release Tools) version 0.3.2

a subset of ATLAS release 1.3.1 :

 main Atlas Applications code + Makefiles :

DiceMain, DicePytMain AtreconMain

(Dice = G3 based ATLAS simulation program)

(Atrecon = ATLAS Reconstruction program)

ATLAS packages and libraries needed for


CLHEP version

ATLAS UK Physics Meeting

It requires:

Linux OS (at the moment tested on Redhat 6.1, 6.2, 7.1) Mandrake has problems

  • CERNLIB 2000 installed

  • If you need CERBLIB2000, use kit1

  • If you are on RedHat 7.1, need compilers in kit2

    It provides:

    all instructions to install / compile / run in a README file

    example jobs to run full simulation and reconstruction plus example datacards (DICE, Atrecon)

     some scripts to set environment variables, to compile and run

ATLAS UK Physics Meeting

It can be downloaded :


ATLAS_kit.tar.gz (~ 90 MB)

 then execute: gtar -xvzf ATLAS_kit.tar.gz

 it will open a directory /ATLAS of ~ 500 MB

 It has been installed and tested on

GRID machines + non-ATLAS machines

 sites involved in first tests of the ATLAS kit:

Milan, Rome, Glasgow, Lund

providing feedback…

 Lacks verification kit, analysis tools


ATLAS UK Physics Meeting

What about globus
What about Globus?

  • Not needed in current kit (but needed if you want to be part of the DataGrid Test Bed)

  • Will be needed for DC1 (?!) – should be in the Kit if so

  • If installing now, take the version from the GridPP website – the Hey CD-ROM is out of date (RPMs will be available)

ATLAS UK Physics Meeting

Dc0 start 1 november 2001 end 12 december 2001
DC0: start: 1 November 2001end : 12 December 2001

  • 'continuity' test through the software chain

  • aim is primarily to check the state of readiness for Data Challenge 1

  • 100k Z+jet events, or similar – several times

  • software works:

    • issues to be checked include

      • G3 simulation on PC farm

      • 'pile-up' handling

      • what trigger simulation is to be run (ATRIG?)

      • reconstruction running.

    • data must be written/read to/from the database

ATLAS UK Physics Meeting

Dc1 start 1 february 2002 end 30 july 2002
DC1: start: 1 February 2002end : 30 July 2002

  • scope increases significantly beyond DC0

    • Several sample of up to 107 events

    • Should involve CERN & outside-CERN sites

    • as a goal, be able to run:

      • O(1000) PC’s

      • 107 events

        • Simulation

        • Pile-up

        • Reconstruction

      • 10-20 days

ATLAS UK Physics Meeting

Aims of dc1 1
Aims of DC1(1)

  • Provide a sample of 107 events for HLT studies

    • improve previous statistics by a factor 10

    • Study performance of Athena and algorithms for use in HLT

  • HLT TDR due for the end of 2002

ATLAS UK Physics Meeting

Aims of dc1 2
Aims of DC1(2)

  • Try out running 'reconstruction' and 'analysis' on a large scale.

    • learn about our data model

    • I/O performances

    • Bottle necks

  • Note

    • Simulation and pile-up will play an important role

ATLAS UK Physics Meeting

Aims of dc1 3
Aims of DC1(3)

  • Understand our ‘distributed’ computing model

    • GRID

      • Use of GRID tools

    • Data management

      • Dbase technologies

        • N events with different technologies

    • distributed analysis

      • Access to data

ATLAS UK Physics Meeting

Aims of dc1 4
Aims of DC1(4)

  • Provide samples of physics events to check and extend some of the Physics TDR studies

    • data generated will be mainly ‘standard model’

    • checking Geant3 versus Geant4

      • understand how to do the comparison

      • understand ‘same’ geometry

ATLAS UK Physics Meeting

DC2 start: January 2003end : September 2003

  • scope depends on the ‘success’ of DC0/1

  • goals

    • use of ‘Test-Bed’

      • 108 events, complexity at ~50% of 2006-7 system

    • Geant4 should play a major role

    • ‘hidden’ new physics

    • test of calibration procedures

    • extensive use of GRID middleware

    • Do we want to add part or all of:

      • DAQ

      • LVl1, Lvl2, Event filter

ATLAS UK Physics Meeting

DC Overview Board

Work Plan Definition










The ATLAS Data Challenges Project Structure Organisation




ATLAS Data Challenges


Resource Matters




ATLAS UK Physics Meeting

Event generation
Event generation

  • The type of events has to be defined

  • Several event generators will probably be used

    • For each of them we have to define the version

      • in particular Pythia

    • Robust?

  • Event type & event generators have to be defined by

    • HLT group (for HLT events)

    • Physics community

  • Depending on the output we can use the following frameworks


      • for ZEBRA output format

    • Athena

      • for output in OO-db (HepMC)

    • Zebra to HepMc convertor already exists.

ATLAS UK Physics Meeting


  • Geant3 or Geant4?

    • DC0 and DC1 will still rely on Geant3 – G4 version not ready

    • Urgently need Geant4 experience

      • Geometry has to be defined (same as G3 for validation)

      • Use standard events for validation

      • The `physics’ is improved

  • for Geant3 simulation, “Slug/Dice” or “Atlsim” framework

    • In both cases output will be Zebra

  • for Geant4 simulation, probably use the FADS/Goofy framework

    • output will be ‘Hits collections’ in OO-db

ATLAS UK Physics Meeting

Pile up

  • Add to the ‘Physics event’ “N” ‘minimum bias events’

    • N depends on the luminosity

      • Suggested

        • 2-3 at L = 1033

        • 6 at L = 2 x 1033

        • 24 at L = 1034

    • N depends of the detector

      • In the calorimeter NC is ~ 10 times bigger than for other detectors

        • Matching events for different pile-up in different detectors a real headache!

  • The ‘minimum bias’ events should be generated first, they will then be picked-up randomly when the merging is done

    • This will be a high I/O operation

    • Efficiency technology dependent (sequential or random access files)

ATLAS UK Physics Meeting


  • Reconstruction

    • Run in Athena framework

    • Input should be from OO-db

    • Output in OO-db:

      • ESD

      • AOD

      • TAG

    • Atrecon could be a back-up possibility

      • To be decided

ATLAS UK Physics Meeting

Data management
Data management

  • Many ‘pieces’ of infrastructure still to be decided

    • Everything related to the OO-db (Objy or/and ORACLE)

      • Tools for creation, replication, distribution

    • What do we do with ROOT I/O

      • Which fraction of the events will be done with ROOT I/O

    • Thousands of files will be produced and need “bookkeeping” and a “catalog”

      • Where is the “HepMC” truth data ?

      • Where is the corresponding “simulated” or AOD data ?

      • Selection and filtering?

      • Correlation between different pieces of information?

ATLAS UK Physics Meeting

Data management1
Data management

  • Several technologies will be evaluated, so we will have to duplicate some data

    • Same data in ZEBRA & OO-db

    • Same data in ZEBRA FZ and ZEBRA random-access (for pile-up)

    • We need to quantify this overhead

  • We have also to “realize” that the performances will depend on the technology

    • Sequential versus random access files

ATLAS UK Physics Meeting

Dc0 planning
DC0 planning

  • For DC0, probably in September software week, decide on the strategy to be adopted:

    • Software to be used

      • Dice geometry

      • Reconstruction adapted to this geometry

      • Database

    • Infrastructure

      • Gilbert hopes (hmmm) that we will have in place ‘tools’ for:

        • Automatic job-submission

        • Data catalog and book keeping

        • allocation of “run numbers” and of “random numbers” ( book keeping)

    • The ‘validation’ of components must be done now

ATLAS UK Physics Meeting

Currently available software june sept2001
Currently available software, June-Sept2001.



Particle lev. simulation

Fast det.simulation


fortran, Hp only



+code dedicated to



Lujets->GENZ bank

Detector simulation


reads GENZ

convert to HepMc


Dice: slug+geant3











reads GENZ +kine

convert to HepMc




Atrecon fortran,c++

read GENZ+kine




ATLAS UK Physics Meeting

Simulation software to be available nov dec2001
Simulation software to be available Nov-Dec2001.




Fast det.simulation

Particle lev. simulation

Detector simulation


reads HepMc



C++, linux



+code dedicated to





EvtGen BaBar

package ( later).

Dice: slug+geant3








reads GENZ +kine

convert to HepMc



ATLAS UK Physics Meeting


  • Analysis tools evaluation should be part of the DC

  • Required for test of the Event Data Model

  • Essential for tests of Computing Models

  • Output for HLT studies will be only few hundred events

  • ‘Physics events’ would be more appropriate for this study

  • ATLAS Kit must include analysis tools

ATLAS UK Physics Meeting

Storage and cpu issues in dc1
Storage and CPU issues in DC1

  • Testing storage technology will inflate data volume/event (easiest to re-simulate)

  • Testing software chains will inflate CPU usage/event

  • The size of the events with pile-up depends on the luminosity

    • 4 MB per event @ L= 2 x 1033

    • 15 MB per event @ L= 1034

  • The time to do the pile-up depends also on the luminosity

    • 55 s (HP) per event @ L= 2 x 1033

    • 200 s (HP) per event @ L= 1034

  • ATLAS UK Physics Meeting

    Issues for dc1
    Issues for DC1

    • Manpower is the most precious resource; coherent generation will be a significant job for each participating site

    • Do we have enough hardware resources in terms of CPU, disk space, tapes, data servers … Looks OK

      • Entry-requirement for generation O(100) CPUs (NCB) – clouds

      • What will we do with the data generated during the DC?

        • Keep it on CASTOR? Tapes?

      • How will we exchange the data?

        • Do we want to have all the information at CERN?, everywhere?

      • What are the networking requirements?

    ATLAS UK Physics Meeting

    Atlas interim manpower request from gridpp
    ATLAS interim manpower request from GridPP

    • Requested another post for DC co-ordination and management tools, running DCs, and Grid integration and verification. Looking at declared manpower, this is insufficient in pre-Grid era!

    • Further post for Replication, Catalogue and MSS integration for ATLAS

    ATLAS UK Physics Meeting

    Interim atlas request
    Interim ATLAS request

    • Grid-aware resource discovery and job submission for ATLAS; essential all this be progammatic by DC2. Overlap with LHCb?

    • Should add to this post(s) for verification activities, which is a large part of the work

    • Should also ask for manpower for verification packages

    ATLAS UK Physics Meeting

    Joint meeting with lhcb
    Joint meeting with LHCb

    • Common project on experiment code installation Grid-based tools

    • Event selection and data discovery tools (GANGA is an LHCb proposed prototype layer between Gaudi/Athena and Datastores and catalogues)

    ATLAS UK Physics Meeting