1 / 12

PD2P, Caching etc.

PD2P, Caching etc. Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011. Introduction. Caching at T2 using PD2P and Victor works well Have 6 months experience (>3 months with all clouds) Almost zero complaint from users Few operational headaches

moshe
Download Presentation

PD2P, Caching etc.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PD2P, Caching etc. Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011

  2. Introduction • Caching at T2 using PD2P and Victor works well • Have 6 months experience (>3 months with all clouds) • Almost zero complaint from users • Few operational headaches • Some cases of disk full, datasets disappearing… • Most issues addressed with incremental improvements like space checking, rebrokering, storage cleanup and consolidation • What I propose today should solve remaining issues • Many positives • No exponential growth in storage use • Better use of Tier 2 sites for analysis • Next step – PD2P for Tier 1 • This is not a choice – but necessity (see Kors’ slides) • We should treat part of Tier 1 storage as dynamic cache Kaushik De

  3. Life Without ESD • New plan – see document and Ueda’s slides • Reduction in storage requirement from 27 PB -> ~10 PB for 2011 for data @ 400 Hz (but could be as much as 13 PB) • Reduction of 2010 data from 13PB to ~6 PB • But we should go farther • We are still planning to fill almost all T1 disks with pre-placed data • 2010+2011+MC = 6 + 10 + 8 = 24 PB = available space • Based on past experience, reality will be tougher, and disk crises will hit us sooner – we should do things differently this time • We must trust caching model Kaushik De

  4. What can we do? • Make some room for dynamic caches • For discussion below, do not count T0 copy • Use DQ2 tags – custodial/primary/secondary – rigorously • Custodial = LHC Data = Tape only (1 copy) • Primary = minimal, disk at T1, so we have room for PD2P caching • LHC Data primary == RAW (1 copy), AOD, DESD, NTUP (2 copies) • MC primary == Evgen, AOD, NTUP (2 copies only) • Secondary = copies made by ProdSys (ESD, HITS, RDO), PD2P (all types except RAW, RDO, HITS) and DaTri only • Lifetimes – required strictly for all secondary copies (i.e. consider secondary == cached == temporary) • Locations – custodial ≠ primary; primary ≠ secondary • Deletions – any secondary copy can be deleted by Victor Kaushik De

  5. Reality Check • Primary copy (according to slide 4) • 2010 data ~ 4 PB • 2011 data ~ 4.5 PB • MC ~ 5 PB • Total primary = 14 PB • Available space for secondaries > ~10 PB at Tier 1’s • Can accommodate additional copies, only if ‘hot’ • Can accommodate some ESD’s (expired gracefully after n months) • Can accommodate large buffers during reprocessing (new release) • Can accommodate better than expected LHC running • Can accommodate new physics driven requests Kaushik De

  6. Who Makes Replicas? • RAW - managed by Santa Claus (no change) • 1 copy to TAPE (custodial), 1 copy DISK (primary) at a different T1 • First pass processed data – by Santa Claus (no change) • Tagged primary/secondary according to slide 4 • Secondary will have lifetime (n months) • Reprocessed data – by PanDA • Tagged primary/secondary according to slide 4, and set lifetime • Additional copies made to a different T1 disk, according to MoU share, automatically based on slide 4 (not by AKTR anymore) • Additional copies at Tier 1’s – only by PD2P and DaTri • Must always set lifetime • Note – only PD2P makes copies to Tier 2’s Kaushik De

  7. Additional Copies by PD2P • Additional copies at Tier 1’s – always tagged secondary • If dataset is ‘hot’ (defined on next slide) • Use MoU share to decide which Tier 1 gets extra copy • Copies at Tier 2’s – always tagged secondary • No changes for first copy – keep current algorithm (brokerage), use age requirement if we run into space shortage (see Graeme’s talk) • If dataset is ‘hot’ (see next slide) make extra copy • Reminder – additional replicas are secondary = temporary by definition, may/will be removed by Victor Kaushik De

  8. What is ‘Hot’? • ‘Hot’ decides when to make secondary replica • Algorithm is based on additive weights • w1 + w2 + w3 + wN… > N (tunable threshold) – make extra copy • w1 – based on number of waiting jobs • nwait/2*nrunning, averaged over all sites • Currently disabled due to DB issues – need to re-enable • Don’t base on number of reuse – did not work well • w2 – inversely based on age • Either Graeme’s table, or continuous, normalized to 1 (newest data) • w3 – inversely based on number of copies • wN – other factors based on experience Kaushik De

  9. Where to Send ‘Hot’ Data? • Tier 1 site selection • Based on MoU share • Exclude site if dataset size > 5% (as proposed by Graeme) • Exclude site if too many active subscriptions • Other tuning based on experience • Tier 2 site selection • Based on brokerage, as currently • Negative weight – based on number of active subscriptions • Other tuning based on experience Kaushik De

  10. What About Broken Subscriptions? • Becoming an issue (see Graeme’s talk) • PD2P already sends datasets within a container to different sites to reduce wait time for users • But what about datasets which take more than few hours? • Simplest solution • ProdSys imposes maximum limit on dataset size • Possible alternative • Cron/PanDA to break up datasets and rebuild container • Difficult but also possible solution • Use _dis datasets in PD2P • Search DQ2 for _dis datasets in brokerage (there will be performance penalty if we use this route) • But this is perhaps the most robust solution? Kaushik De

  11. Data Deletions will be Very Important • Since we are caching everywhere (T1+T2), Victor plays equally important role as PD2P • Asynchronously cleanup all caches • Trigger based on disk fullness threshold • Algorithm based on (age+popularity)&secondary • Also automatic deletion of n-2 – by AKTR/Victor Kaushik De

  12. How Soon Can we Implement? • Before LHC startup! • Big load initially on ADC operations to cleanup 2010 data, and to migrate tokens • Need some testing/tuning of PD2P before LHC starts • So, we need decision on this proposal quickly Kaushik De

More Related