1 / 26

Analysis framework in ALICE

This analysis framework in ALICE allows for transparent access to all resources with the same code and different inputs, providing a solution to optimize CPU/IO over large datasets and sparse resources.

glennk
Download Presentation

Analysis framework in ALICE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Andrei Gheata Analysis framework in ALICE

  2. Analysis in ALICE • Three main analysis modes • Prompt data processing @CERN with PROOF • calibration, alignment, reconstruction, analysis • Analysis with local PROOF clusters • Batch Analysis on the GRID infrastructure • Scheduled analysis (see talk of Andreas) • At least at the beginning • Needed for optimizing CPU/IO over large datasets and very sparse resources

  3. Main goals for an analysis framework in ALICE • Provide transparent access to all resources with the same code • Usage: Local, AliEn grid, CAF/PROOF • Provide ransparent access to different inputs • ESD, AOD, Kinematics tree (MC truth) • Allow sharing resources by multiple analysis modules • Framework must allow accommodating different analysis in the same session

  4. TSelector – an event loop handler for trees TTreePlayer Event Event Event Event Event Process(“MySelector”) tracks tracks tracks tracks tracks vertices vertices vertices vertices vertices V0’s V0’s V0’s V0’s V0’s Event tree UserSelector Begin() SlaveBegin() Notify() Process() –> User algorithm SlaveTerminate() Terminate()

  5. TSelector – an event loop handler for trees TSelector SINGLE CLIENT PROOF CLUSTER GRID

  6. FRAMEWORK

  7. Analysis as a task • Selector model - simple user analysis looping input events and producing some histograms • Need to federate several different analysis algorithms • Sharing the same input data • Using data produced by a different module • Data flow within analysis chains • Give a maximum of workload to CPU while data is in memory

  8. From selectors to tasks • Base class providing user methods called by the framework at well-defined stages (very close to TSelector ones) • Analysis modules may connect one to another in a workflow • E.g ESD -> filtering -> AOD -> PWG -> histograms • ROOT TTask well suitable • Schema needed for making it work -> Analysis framework

  9. UserTask : public AliAnalysisTask • ConnectInputData() • Define which data is connected to which slot • CreateOutputObjects() • Create Histograms • Init(),LocalInit() • Optional, e.g. read parameters • Exec() • The event loop • Terminate() • Called at the end, can draw e.g. a histogram

  10. CONT 0 CONT 1 INPUT 0 INPUT 1 AliAnalysisTask OUTPUT 0 CONT 2 AliAnalysis… Framework • Data-oriented model composed of independent tasks • Task execution triggered by data readiness • Tasks are owned and managed by AliAnalysisManager • Parallel execution and event loop done via TSelector functionality • Mandatory for usage with PROOF N.B.: The analysis framework itself has a very general design, not bound to ALICE software

  11. Why a common framework • To make data processing more efficient • In particular if resources are sparse. • Optimise CPU/IO ratio • Make possible running user analysis in PROOF or GRID, but hide as much as possible these technologies (same user code) • But also • Develops common knowledge base and terminology. • Helps documenting the analysis procedure and makes results reproducible.

  12. Tasks and event loop AliAnalysisManager TObjArray *fContainers TObjArray *fTasks AliAnalysisSelector Top cont Chain->Process() Top level tasks and containers (“Train”) task1 task2 output1 output2 POST EVENT LOOP ESD chain Task Fit task4 result result EVENT LOOP

  13. AliAnalysisManager • Holds lists of user-defined tasks and their data containers • Provides simple user interface to connect tasks and containers in a working chain • Spawns a custom TSelector to initiate the event loop over the input chain • Task event processing delegated by AliAnalysisSelector to the manager

  14. TSelector AM AM AM AM Analysis Manager AM AM AliAnalysisSelector AM task1 task1 task1 task1 task1 task1 task1 task2 task2 task2 task2 task2 task2 task2 Input list Outputs Outputs Outputs Outputs Outputs Outputs Outputs Inputs Inputs Inputs Input chain Inputs Inputs Inputs task3 task3 task3 task3 task3 task3 task3 taskN taskN taskN taskN taskN taskN taskN AliAnalysisManager – PROOF mode CLIENT PROOF AM->StartAnalysis(“proof”) MyAnalysis.C Master O1 Worker Worker SlaveBegin() Worker AM Worker Process() Worker SlaveTerminate() Terminate() O1 O2 O2 Output list O On O On O

  15. Memory checking • AliAnalysisManager::SetNSysInfo(Long64_t nevents) • Monitoring memory&timing each N events • Using AliSysInfo functionality to spot leaking tasks • Only in local mode • 2 output files: syswatch.log and syswatch.root • Currently investigating how to improve monitoring per task

  16. Memory usage profile vs time

  17. PROOF-related add-ons • Processing datasets instead of a chain StartAnalysis(const char *type, const char *dataset,…) • Saving files on proof workers AliAnalysisDataContainer::SetSpecialOutput() AliAnalysisManager::SetSpecialOutputLocation(const char* dir) • Specified output files will be written to the special output location using TFile::Cp() • If location not specified, output files are merged by PROOF and registered or sent back to client

  18. AliVEvent AliVParticle AliAnalysisTask AliAnalysisTask AliAnalysisTask UserANALYSISTask The overall picture AliAnalysisManager AliVEventHandler AliMCEventHandler AliAODHandler (Output) AliAnalysisTask AliESDInputHandler (AliAODInputHandler) Data AliAnalysisTasSE AliESDEvent (AliAODEvent) AliMCEvent AliAODEvent Tasks AliESDtrack AliMCParticle AliAODtrack

  19. Integration of User Tasks • Relatively smooth so far • Needs user support to scrutinize (in particular for CAF/PROOF): • Memory requirements (leaks) • Correct data member initialization • On client and workers

  20. Plans for scheduled analysis: Analysis train producing AODs Acceptance and Efficiency Correction Services Monte Carlo Truth ESD/AOD TASK 1 TASK 2 TASK… TASK N AOD See talk of Andreas Morsch for detailed info

  21. Documentation and current status of analysis modules • Maintained and updated on analysis web page • Semantics of AliAnalysisTask methods explained with examples • Analysis train put togeather and tested centrally on regular basis (M.Gheata) • As different PWG integrate their analysis code with the framework

  22. Documentation and current status of analysis modules

  23. CODE STRUCTURE STEERbase ESD AOD libANALYSIS AliAnalysisSelector AliAnalysisManager AliAnalysisTask AliAnalysisDataContainer AliAnalysisDataSlot AliAnalysisFilter AliAnalysisCuts libANALYSISalice AliAnalysisTaskSE AliAnalysisTaskKineFilter AliAnalysisTaskESDFilter AliKineTrackCuts

  24. Some future challenges • Support for more exotic use cases • Mixing different inputs • Support more than one go over the input dataset • Needs support from existing infrastructures (PROOF, AliEn) • Better support for error handling • Spotting where the problem is in such combined systems is not trivial • Improving system info inspection per task

  25. Summary • “Common practice” introduced by a new analysis framework developed by ALICE offline • Runs transparently the same user code locally, in PROOF or GRID • Framework manages a list of independent tasks: • Feeding from same input data or chained one to another • Execution triggered by data readiness • Sequential execution of the top level task (train) driven by input chain (Tselector mechanism) • Common I/O is managed by event handlers • ALICE-specific implementations for ESD, AOD and MC data • Framework adopted by all PWG and well hammered on CAF and AliEn grid

  26. Thanks to all people that contributed in one way or the other to developing, improving and testing the framework A. Morsch M. Gheata J.-F. Grosse-Oetringhaus Ch, Klein-Boesing M. Oldenburg F. Carminati Y. Schutz and many others

More Related