ADC Weekly Meeting , May 8 2012 Annecy 2012 Technical Interchange Meeting Highlights

ADC Weekly Meeting, May 8 2012 Annecy 2012 Technical Interchange Meeting Highlights Simone Campana – CERN IT/ES

Introduction • 2 days meeting (Wed PM to Fri AM) • 4 sessions • Data Management • Production System • Analysis • Networking • Plus one session with invited speakers • Intel, EPFL (Miguel Branco) • Many thanks to session conveners for the material ADC Weekly, 8/5/2012

TIM April 2012Data Management Session Highlights and Action Items for ADC weekly, CERN, 8 May 2012 S. Campana, V. Garonne, I. IUeda

Data Management • Storage Federations • xrootd the only realistic solution for the medium term • Use case focuses on failover for data access • More advanced use cases can be explored in future (“repairing” data, file level caching) • CMS experience in pre-production (failed access recovery) • CMS spent lot of time (and will spend more) in CMSSW I/O tuning (reducing #reads and increasing #hits in read-ahead). Key for success in WAN access. • ATLAS experience in USATLAS R&D • Automated tools for WAN tests on top of HC • Integration of xroot federation with Panda is in progress • Many open questions • Security, monitoring, content publication • MB recommended to create topical working groups • ATLAS will try to expand the experience with xroot federations outside the US. ADC Weekly, 8/5/2012

Data Management • Transfer Services • FTS will remain the baseline Transfer Service • FTS3 will cure known architectural issues • Channel concept, plugin support for protocols • FTS3 prototype in June, multi VO testing • Point2Point protocols • gridFTP as baseline, new version and session reuse will help reducing overheads • Xrootdis an alternative. Needs to be supported on all systems (see also discussion on federations) • HTTP is a serious option. Needs more integration and testing • SRM • Functionalities will be slowly replaced • Core set of functionalities will remain (access to MSS) • Positive experience with BestMAN+gridftp+Lustre at OU SWT2 • Interesting analysis from DDM Tracer data. Further studies suggested. ADC Weekly, 8/5/2012

Data Management • Rucio • Architecture and Prototype API now available • Rucio Demo in June, prototype in October • Case sensitivity • Would like to move to Case Sensitive datasets and file names in DDM (UNIX like) • No strong online and offline objections, will try to agree at June SW week • Rucio scope • Proposal presented, but possible issues for the usage of “Campaigns” • Is being re-thought, DDM team will present a new proposal soon • Naming convention for files at sites in Rucio • Controversial discussion (less intuitive organization of files at sites for local access) • Being re-iterated within ADC and with Data Prep and PhysCoord (ICB?) ADC Weekly, 8/5/2012

TIM April 2012ProdSys Session Highlights and Action Items for ADC weekly, CERN, 8 May 2012 K.De, A.Filipcic, A.Klimentov, R.Walker and A.Vaniachine

Production System and Grid Data Processing • Progress since TIM in Dubna • APF status • PY factory to be replaced • Still manual config files • Pending integration with AGIS • Fair share policy implementation • HLT Task Request • Real time definition of tasks and jobs • Multi-cloud production widely used • Tier-2s usage • Short term plans • Jobs submission vs resources heterogeneity • AKTR, et al overload • processing 10+k tasks requests with 90+k output datasets • Previous overload happen about a year ago – at the time of TIM in Dubna • Not clear why these rare events (overload and TIM) are correlated in time • Monitoring and better integration with SSB Alexei Klimentov – TIM Highlights

Dynamic Job Definition (JEDI) • JEDI core foundations • No predefined (and pre-assigned jobs) • Task Request: database templates • “Late” datasets registration • Reassessment of PandaDB and ProdDB • Understand benefits of redundancy • Separation of concerns • Task post processing • If you do not like the name “JEDI” the alternative is “PDJD” … • Panda Dynamic Job Definition Alexei Klimentov – TIM Highlights

Dynamic Evolution for Tasks (DEfT) • Rate of task requests grows exponentially • Linear growth in users and support requests • Growing list of requirements and use cases • New use cases: HLT, FTK, user analysis tasks • First ideas about new architecture and how JEDI and DEfT will be developed • ProdSys technical meeting in Lubljana (June 2012) to discuss JEDI and DEfT development Alexei Klimentov – TIM Highlights

ProdSys session II • Rucio/DDM and ProdSys/PanDA overlaps • What we want to keep and what we want to drop • Multi-core jobs • Ready for full Grid Production in simple scernario • glideinWMSstudies • Work in progress to find limits in various components Alexei Klimentov – TIM Highlights

TIM April 2012Distributed Analysis Session Highlights and Action Items for ADC weekly, CERN, 8 May 2012 F. Barreiro, D. Benjamin, D. Van Der Ster

ATLAS&CMS Common Analysis Framework • Initiative from CERN IT-ES, ATLAS and CMS • Assess potential of using common analysis solution based on PanDAandglideInWMS • Currently at the end of Feasibility Studyhttp://cern.ch/go/9mNC • Compare and analyze experiments’ workflows and architectures • Indentify dependencies, what can be reused and potential show-stoppers • Study and compare sub-components: Server sides, PanDA pilot and pilot factories, GlideInWMS • Evaluate integration scenarios for PanDA and GlideInWMSensuring no loss of functionality • Prepare final document with conclusions and proposal for Proof-of-Concept • To be validated by the experiments • In case of green light used as input for coming Functionality and Operations Studies

Improving Job Efficiency • Server Side Retries • Only 20% of failures are “retriable” • Normally OK at 2nd attempt, 3rd attempt useless • Non retriable failures are mostly “athena” • Well… something else, but masked by athena. • Work will be done for accounting those properly • proot • Main goal is to catch failures and categorize them properly (beside setting correctly the root env) • This is difficult if you do not “own” the event loop • So, now an EventLoop package and its grid driver have been developed ADC Weekly, 8/5/2012

Server Side Tasks • Current issues • Many actions today happen client side => slowness • Data discovery, job splitting, DS registration, retry • No task concept in Panda => complicated bookkeeping • User interest is in task rather than subjobs • Start moving client functionalities to server side • Simplify client tools, centralize functionalities, improve bookkeeping • Introduce Task concept in Panda (Task/Jobset table) • Modify clients to submit tasks/Jobsets (instead of subjobs) • Implement subjob definition server side • Evolve Panda server to handle subjobs and task/jobdef synchronization in DB • Change bookkeeping tools • Interact with task/jobdef table directly • Send retry commands to be executed by the server • Move toward server-side task management • Straightforward once job submission is mover server-side • Missing piece is task chaining

Pilot Plans/Ideas • Moving to “experiments” plugins • Refactor/clean pilot code • Provides a better platform for many contributors • Job recovery simplified • Could be used outside US (UK interest) • Could be used for analysis (to be evaluated) • StageIN/OUT • StageOUT retry to the T1 (instead of local): under development • StageIN retry from another source: leverage xrootdfederation • ErrorDiagnostic class in development and DEBUG mode for pilots • Avoid “grepping” logfiles, modularize etc … • Peeking capability • Many others … help needed. • Common solution initiative should bring in more contributors

Conclusions • A very productive workshop • Some subjects probably deserved a bit more time for discussion • ADC software is nowhere “frozen” • Needs to keep up with the demand • Strong focus on commonalities for long term sustainability • Several ideas/plans will be followed up in the next months in ADCDev and ADCOps • Plus dedicated workshops (e.g. Prodsys in Lubjiana) ADC Weekly, 8/5/2012

ADC Weekly Meeting , May 8 2012 Annecy 2012 Technical Interchange Meeting Highlights

ADC Weekly Meeting , May 8 2012 Annecy 2012 Technical Interchange Meeting Highlights

Presentation Transcript

CAP Policy Meeting May 8, 2012

Technical Interchange Meeting

Technical Interchange Meeting

Faculty Meeting August 8, 2012

2012 6 Man Meeting May 10, 2012

Technical Interchange Meeting

Sensor Cloud Technical Interchange Meeting

NEC Meeting May 2012

Trajectory Interchange Meeting Tuesday, October 9, 2012 – Thursday, October 11, 2012

2012 ATLAS Technical I nterchange Meeting Annecy, France

May 8 th Lead Agency Directors’ Meeting May 8, 2012

Technical Interchange Meeting – ROC / NSSL / NCAR

ADC technical interchange meeting Tokyo, May 2013

SRC Weekly Meeting June 19, 2012

SRC Weekly Meeting June 12, 2012

Unconventional Gas Committee Meeting May 8, 2012

4D Trajectory Technical Interchange Meeting

General MAC Meeting May 2012

Annual Meeting February 8, 2012

Board Meeting May 31, 2012

PTA Meeting 11-8-2012

SRC Weekly Meeting June 26 , 2012