Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon
Overview • Formats • Optimizations • Local tests • Large scale tests
Formats Ball surface ~ event size • Sizes just as indication, in reality depends on: • Pile-up • Stream • Repro tag • D3PDs not flat tree any more. With additional trees some D3PDs have ~10k branches data MC MC JETMET data egamma AODs ESDs D3PDs
Root File organization baskets events • We choose to fully split (better compression factor) • Baskets are written to file as soon as they get full • That makes parts of the same event scattered over the file std doubles floats file
Optimizations Constrains • Memory • Disk size • Read/write time Options • Split level – 99 (full) • Zip level - 6 • Basket size - 2kb • Memberwise streaming • Basket reordering • by event • by branch • TTreeCache • “New root” – AutoFlush matching TTC size Tradeoffs Read scenarios • Full sequential • Some events • Parts of events • Proof
Current settings • AODs and ESDs • 2010 – fixed size baskets (2kB), files re-ordered but basket sizes not optimized at the end of production jobs • 2011 until now – All the trees (9 of them) given default 30 MB of memory, basket sizes “optimized”, autoflushed ( if its unzipped size was larger than 30MB ) • 17.X.Y : • The largest tree “Collection Tree” optimized • split level 0 and memberwise streaming • ESD/RDO autoflush each 5 events, AOD each 10 events • other trees back to 2010 model.
Current settings • D3PDs • 2010 • fixed size baskets (2kB) • reodered by event • basket size optimized properly • zip level changed to 6 • done in merge step • 2011 till now • ROOT basket size optimization • autoflush at 30 MB • No information if re-optimization done or not (need to check!) • 17.X.Y • not clear yet
Local disk performance D3PD • When reading all events real time dominated by CPU time • Not so for sparse reading • Root optimized (file rewritten using hadd –f6) improves in CPU but not in HDD time (!) 2010 We are here now
Large scale tests • D3PD reading • Egamma dataset 11 files – 90 GB • Tests: • 100% • 1% • TTreeCache ON • root optimized
EOS – xroot disk pool • Experimental1 setup for a large scale analysis farm • Xroot server with 24 nodes each with 20 x 2TB raid0 FS (for this test only 10 nodes were used with maximum theoretical throughput 1GB/s ) • To stress it used 23 x 8 cores with ROOT 5.26.0b (slc4, gcc 3.4) • Only Proof reading D3PDs tested • 1Caveat: real life performance will be significantly worse.
EOS – xroot disk pool cont. • Here only maximal sustained event rates (real use case averages will be significantly smaller) • Original – it would be faster to read all the events even if we would need only 1% • Reading full optimized data gave sustained read speed of 550 MB/s Log scale !
dCache vs. Lustre • Tested in Zeuthen and Hamburg • Minimum bias D3PD data • Single unoptimized file (Root 5.22, 1k branchesof 2kb, CF=1) • Single optimized file (Root 5.26, hadd -f2) HDD read requests
Conclusions • Many possible ways and parameters to optimize data for faster input • Different formats and use cases with sometimes conflicting requirements makes optimization more difficult • In 2010 we used file reordering and that significantly decreased job duration and stress on the disk systems • Currently taken data optimized by ROOT but that may be suboptimal for some D3PDs • In need of new performance measurements and search for optimal settings • DPM, Lustre, dCache • Need careful job specific tuning to reach optimal performance