60 likes | 204 Views
The weekly meeting on December 2, 2013, addressed critical issues in analysis job efficiency and storage for the ALICE collaboration. Key findings indicate that large file sizes hinder performance, and current formats depend heavily on AliRoot types. Investigations suggest that flattening the AOD format can lead to significant I/O performance improvements, with potential gains in speed and size reduction exceeding 50%. The transition to a flattened structure simplifies user analysis, enhances data accessibility, and is crucial for data preservation and parallelism in upcoming Run2 and Run3.
E N D
Flat AOD Gheata ALICE weekly meeting 2 Dec 2013
Motivation • Important I/O for analysis jobs, decreasing job efficiency • Big file size -> storage problems • Format highly dependent on AliRoot types • Other experiments are using ntuple-like data in end-user analysis • Investigations and tests at small scale already hint for improvements • Disabling non-needed branches -> x5 speed • Tests on analysis specific data skimming -> huge factors to gain • Discussed flattening/reducing AOD format • Useful for improving I/O performance • Useful for Run2/Run3 • Useful for data preservation (smaller, simpler, easier to share) • Useful for different levels of parallelism in analysis
Exercise • Use AODtree->MakeClass() to generate a skeleton, then rework • Keep all AOD info, but restructure the format Int_tfTracks.fDetPid.fTRDncls -> Int_t *fTracks_fDetPid_fTRDncls; //[ntracks_] • More complex cases to support I/O: typedefstruct { Double32_t x[10]; } vec10dbl32; Double32_tfTracks.fPID[10] -> vec10dbl32 *fTracks_fPID; //[ntracks_] Double32_t *fV0s.fPx //[fNprongs] -> TArrayF *fV0s.fPx; //[nv0s_] • Convert AliAODEvent-> FlatEvent • Try to keep the full content AND size • Write FlatEvent on file • Compare file size and read speed
Results • Tested on AOD PbPb: AliAOD.root ****************************************************************************** *Tree :aodTree : AliAOD tree * *Entries : 2327 : Total = 2761263710 bytes File Size = 660491257 * * : : Tree compression factor = 4.18 * ****************************************************************************** ****************************************************************************** *Tree :AliAODFlat: Flattened AliAODEvent * *Entries : 2327 : Total = 2248164303 bytes File Size = 385263726 * * : : Tree compression factor = 5.84 * ****************************************************************************** • Data smaller (no TObject overhead, TRef->int) • 30% better compression • Reading speed • Old: CPU time= 103s , Real time=120s • New: CPU time= 54s , Real time= 64s
Implications • User analysis more simple, working mostly with basic types (besides the event) • Simplified access to data, highly reducing number of (virtual) calls • ROOT-only analysis • No problem with schema evolution, supported as before • Filtering hierarchical to flat AOD’s straightforward • Much better vectorizable track loops • Deltas would have to be also flattened
Conclusions • Flattening the event structure has benefits • Sizeable gains in size and speed (>50%) • Many benefits for user code performance • Conversion straightforward • Does not rule out further skimming of data content per analysis type • An approach we will have to consider seriously for Run2 and specially Run3