Optimization and usage of D3PD

Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon

Overview • Formats • Optimizations • Local tests • Large scale tests

Formats Ball surface ~ event size • Sizes just as indication, in reality depends on: • Pile-up • Stream • Repro tag • D3PDs not flat tree any more. With additional trees some D3PDs have ~10k branches data MC MC JETMET data egamma AODs ESDs D3PDs

example

Root File organization baskets events • We choose to fully split (better compression factor) • Baskets are written to file as soon as they get full • That makes parts of the same event scattered over the file std doubles floats file

Optimizations Constrains • Memory • Disk size • Read/write time Options • Split level – 99 (full) • Zip level - 6 • Basket size - 2kb • Memberwise streaming • Basket reordering • by event • by branch • TTreeCache • “New root” – AutoFlush matching TTC size Tradeoffs Read scenarios • Full sequential • Some events • Parts of events • Proof

Current settings • AODs and ESDs • 2010 – fixed size baskets (2kB), files re-ordered but basket sizes not optimized at the end of production jobs • 2011 until now – All the trees (9 of them) given default 30 MB of memory, basket sizes “optimized”, autoflushed ( if its unzipped size was larger than 30MB ) • 17.X.Y : • The largest tree “Collection Tree” optimized • split level 0 and memberwise streaming • ESD/RDO autoflush each 5 events, AOD each 10 events • other trees back to 2010 model.

Current settings • D3PDs • 2010 • fixed size baskets (2kB) • reodered by event • basket size optimized properly • zip level changed to 6 • done in merge step • 2011 till now • ROOT basket size optimization • autoflush at 30 MB • No information if re-optimization done or not (need to check!) • 17.X.Y • not clear yet

Local disk performance D3PD • When reading all events real time dominated by CPU time • Not so for sparse reading • Root optimized (file rewritten using hadd –f6) improves in CPU but not in HDD time (!) 2010 We are here now

Large scale tests • D3PD reading • Egamma dataset 11 files – 90 GB • Tests: • 100% • 1% • TTreeCache ON • root optimized

EOS – xroot disk pool • Experimental1 setup for a large scale analysis farm • Xroot server with 24 nodes each with 20 x 2TB raid0 FS (for this test only 10 nodes were used with maximum theoretical throughput 1GB/s ) • To stress it used 23 x 8 cores with ROOT 5.26.0b (slc4, gcc 3.4) • Only Proof reading D3PDs tested • 1Caveat: real life performance will be significantly worse.

EOS – xroot disk pool cont. • Here only maximal sustained event rates (real use case averages will be significantly smaller) • Original – it would be faster to read all the events even if we would need only 1% • Reading full optimized data gave sustained read speed of 550 MB/s Log scale !

dCache vs. Lustre • Tested in Zeuthen and Hamburg • Minimum bias D3PD data • Single unoptimized file (Root 5.22, 1k branchesof 2kb, CF=1) • Single optimized file (Root 5.26, hadd -f2) HDD read requests

Conclusions • Many possible ways and parameters to optimize data for faster input • Different formats and use cases with sometimes conflicting requirements makes optimization more difficult • In 2010 we used file reordering and that significantly decreased job duration and stress on the disk systems • Currently taken data optimized by ROOT but that may be suboptimal for some D3PDs • In need of new performance measurements and search for optimal settings • DPM, Lustre, dCache • Need careful job specific tuning to reach optimal performance

Optimization and usage of D3PD

Optimization and usage of D3PD

Presentation Transcript

USAGE AND STIGMATIZATION

USAGE AND STIGMATIZATION

Modes of Usage

Glossary Of Usage

CUDA Advanced Memory Usage and Optimization

USAGE OF HAVE

W/Z D3PD Production Meeting

Grammar and Usage

Usage and Agreement

USAGE OF METALS

Grammar and usage

Usage of Tex

Grammar and usage

Grammar and usage

Grammar and usage

Rules of Usage

Usage These Ideal Search Engine Optimization Strategies

Usage And Applications Of Niobium

Usage and Stigmatization

Radiation usage in wireless sensor network optimization?