140 likes | 170 Views
Towards a framework for organized analysis focusing on AOD production from ESD/AOD data with flexible task structures and transparent data handling, enhancing data and task organization for optimized analysis processes. Implementation includes task organization, data staging, and robust book-keeping. Potential integration of user code. Proposal for improved data and task organization, with detailed steps for task handling and common output object management.
E N D
Andreas Morsch Weekly Offline Meeting 31/5/2007 Towards a Framework forOrganized Analysis
Why Organized Analysis ? • Most efficient way for many users (analysis tasks) to read and process the full data set. • In particular if resources are sparse. • Optimise CPU/IO ratio • But also • Helps to develop a common well tested framework for analysis. • Develops common knowledge base and terminology. • Helps documenting the analysis procedure and makes results reproducible.
Scope • Focus on production of AODs from ESD/AOD
Design Goals • Flexible task and data container structure • User code independent of computing schema (interactive: local/proof or batch: grid) • Input data: ESD, AOD • Same design (done) • Common base class ? • Output data: • AOD + user histograms • Transparent handling of memory resident and file resident data
Implementation • Analysis train/taxi similar to PHENIX • Based on the existing AliAnalysisManager/Task framework (A. and M. Gheata)
Organization of Data and Tasks • Input data staging ? • Several trainlets on sub-data sets staged prior to train departure. • Better: One analysis “train” on the complete data set. • Limits the complexity of the production. • Should be designed to give the optimum under all conditions.
Organization of Tasks • Proposal • On top level • Tasks reading ESDs/AOD and producing AODs. • Organized by analysis manager • Below top level • Sub tasks producing intermediate transient data • Organized by users (PWGs)
Organization of Data and Tasks • Organisation of analysis tasks • One sub-job per task • Better: One job executing all tasks. • Protection against sub-task crashes • “Isolate” tasks using C++ try-throw-catch mechanism • Check memory / task • Check output data size / task • Protection against data corruption • Access rights per task • Dynamic cancelling of tasks • Input data quality checks • could be the first task in the row • Robust book-keeping
GRID/PROOF • Transparence of computing schema • Some improvements in AliAnalysisManager/AliAnalysisTask • Possibility to notify tasks when file is changed in chain (done) • More robust output data streaming (done) • Possibility to flag tasks as “post event loop”-tasks (done) • Handling of file resident data • PROOF uses object streaming • What is a streamable object/task ? • Needs exact defintion. • Attention • Normally persistent objects are streamed • Here: transient object are streamed !! • Needs user support and documentation
Possible Integration of User Code AliAnalysisTask Steers Delegates AliAnalysisUserTask User AnalysisCode Implements Interface Deals with AliAODEvent Documents selection and analysis parameters Factory Configuration Macro Working prototype for AliAnalysisTaskJets exists
Who manages the common output objects AliAODEvent and AOD Tree ? • What has to be called when • SlaveBegin • AliAODEvent constructor • Open file • AOD tree constructor • ExecuteAnalysis • AOD tree fill • AliAODEvent Clear • Terminate • WriteTree • Close File
Possible Solution • Header and trailer analysis tasks handling the AliAODEvent I/O • Top Task with user tasks as daughters • AliAnalysisManagerAOD deriving from AliAnalysisManager and re-implementing • SlaveBegin • ExecuteAnalysis • Terminate • Virtualize AliAODEvent (AliVEvent) and add calls to an object of type AliVEvent in AliAnalysisManager. Move has much code as possible into AliAODEvent • CreateTree • FillTree • Clear • ,,,
AliAnalysisManager AliAODEvent AliAnalysisManagerAOD AliVEvent AliAnalysisManager AliAODEvent AliVOutputEventHandler AliAnalysisManager AliAODEventHandler AliAODEvent
More AOD I/O Management Tasks • Granting access rights per branch • Consistency checks • Chopping output files • …