1 / 33

CD Overview of Resources, Manpower, Planning and Strategy Vicky White Fermilab March 30, 2005

CD Overview of Resources, Manpower, Planning and Strategy Vicky White Fermilab March 30, 2005 . Resources - Planning for the Future. Computers, disks, robots, tape drives, tapes, servers, etc. required to support the program

sun
Download Presentation

CD Overview of Resources, Manpower, Planning and Strategy Vicky White Fermilab March 30, 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CD Overview of Resources, Manpower, Planning and Strategy Vicky White Fermilab March 30, 2005

  2. Resources - Planning for the Future • Computers, disks, robots, tape drives, tapes, servers, etc. required to support the program • Workforce with the necessary skills to design, develop and run the systems and services needed • Computer Facilities – space, power, cooling • Networking infrastructure • Cyber Security environment for Open Science Although this review focuses on Run II, MINOS, MiniBoone our planning of course encompasses the total program – with Run II a large, but not totally driving, force in all of the above. DOE Review of Tevatron Operations at FNAL

  3. Resource Planning Strategies • Common approaches and services wherever possible • Enstore Storage System for all – with SRM grid interface • dcache, SAM, Oracle databases, ROOT, for most • Grid Computing is strategic direction • Run II and LHC must be aligned and use worldwide computing • Resources must be shared internally and externally • Physics is not the only science • Networks and use of the network is key • CMS can learn from Run II (move staff) • Soft funding should be judiciously pursued and can benefit various lab programs • We need to invest (even) more in information systems and automated workflow so we can • Improve efficiency and automation • Understand and set metrics on how we are doing • We must become (even) more efficient and clever so we can invest in ILC and future programs DOE Review of Tevatron Operations at FNAL

  4. How do we plan and get the right balance ? • Balance M&S and SWF • Slightly different balance to detectors/accelerator • Balance Current Program Operations and future program and initiatives • Provide (just) sufficient resources at Fermilab (people and hardware) to support the physics and provide the right “core strength” to leverage outside resources DOE Review of Tevatron Operations at FNAL

  5. Key Elements of Planning • Understand what we are doing and why and where all the effort and money goes • Detailed bottoms up budget process with manpower included for each WBS • Detailed tracking of expenses vs budget • Monthly effort reporting for everyone (system supports complex cross-experiment-division efforts contributing to common projects) • Understand what the experiments need to succeed (and why) • Yearly review of Run II – detailed planning • MOUs (ongoing) • International Finance committees for CDF & D0 • Metrics on usage of resources • LHC Physics Center Steering meetings DOE Review of Tevatron Operations at FNAL

  6. Key Elements of Planning (2) • Understand the staff we need to carry out our work and constantly evolve our organization to meet the needs • Hire for new skills (e.g. Grid, Cyber Security, Network Research, Simulation, Storage) • 32 new hires in past calendar year (4 recruited internally) (19 losses also) • Provide opportunities for growth and leadership and spread of know-how internally. • ~12 people have changed assignment in the past calendar year DOE Review of Tevatron Operations at FNAL

  7. Budget Process DOE Review of Tevatron Operations at FNAL

  8. Data -> information -> decisions • All Financial reports online at http://cd-entreport.fnal.gov/cdfinancial/default.csp • Effort reporting http://wwwserver2.fnal.gov/cfdocs/effort/Main/Login.cfm • Some Metrics http://computing.fnal.gov/cd/metrics/ DOE Review of Tevatron Operations at FNAL

  9. Stakeholder communications • Bi-weekly meeting slot with CDF and D0 spokespersons + Computing leaders • As needed meetings with other expt spokespersons and/or computing leaders • CD Head is member of CMS SW&C PMG and CMS Advisory Software & Comp. Board • CD Head or Deputy attends various lab PMGs • CD representative on CDF and D0 PMGs • Stakeholders participate in briefings and status meetings – present needs, requests, etc. • Lab Scheduling, LAM and All-Experimenters meetings • Windows Policy committee • Computer Security Working Group DOE Review of Tevatron Operations at FNAL

  10. Meetings and Communications inside CD • Division Operations meeting weekly • Facility planning meetings weekly • Department Heads meetings weekly • Kitchen Cabinet meeting of Division Head, Deputy and Assoc Heads weekly • Budget Meeting ~monthly, 2 Budget retreats per year • Briefings on issues/project proposals as needed • Activity status reports • Activity Coordination Meetings • Grid and Data Management (2 per month nominal) • Accelerator Division Support Projects (monthly nominal) • CMS Projects (monthly nominal) • FermiGrid meetings • Division All-hands meetings (~2 per year) • Network and Cyber Security retreats DOE Review of Tevatron Operations at FNAL

  11. CD/Lab Program alignment • CD head meets weekly with Associate Director for Research • Occasional meetings with Director • Division-heads meetings • SAG meetings • Attend PAC, HEPAP DOE Review of Tevatron Operations at FNAL

  12. How do we set Priorities and Balance ? • Listen to all the input • Look at all the data • Talk with department heads, directorate • Present and defend our budget • Decide! DOE Review of Tevatron Operations at FNAL

  13. Computing Division ES&H Program • Our ES&H program and record is stellar and we have broken our own and other lab records for working without a lost work time injury http://www-esh.fnal.gov/pls/default/lsc.html • We received an award for the best sustained record DOE Review of Tevatron Operations at FNAL

  14. Risks • Analysis of last year’s risks (in Spares) • Talked about major Risks in the plenary presentation – (details in Spares) • Grid • Facility Infrastructure • Tapes all in 1 building • Government rules • Weakened collaborations DOE Review of Tevatron Operations at FNAL

  15. Some other potential risks • Can’t get or keep people with the skills we need • I hope not. We have challenging computing problems and an exciting physics program • Can’t gain much in further efficiencies – just more work to do with fewer people • We certainly are doing more with the same number of people and also have many things automated • But it is getting harder • Experiment or TeV planning is wrong and computing needs are much greater than planned and can’t be met with offsite resources • ? DOE Review of Tevatron Operations at FNAL

  16. Job Categories 2002-present Programs added CMS(+12) Accelerator Support (~12) Network Research (2) Programs rolled off ? DOE Review of Tevatron Operations at FNAL

  17. Computing M&S + SWF Stripes show CD contribution to Research Program total DOE Review of Tevatron Operations at FNAL

  18. Conclusions • Managing our resources in a declining budget scenario is not an easy task and the plan may need adjustments • Depends on which risks turn out to be real • However, we believe that we have a plan that does the best we can to ensure appropriate support for “Tevatron Operations at FNAL” while • Reducing staff • Putting additional effort into CMS, Lattice QCD and future programs • Dealing with increased regulatory pressures on IT and Computer Security • Dealing with workforce evolution and morale • I believe we will succeed – but we will have to work very hard at it DOE Review of Tevatron Operations at FNAL

  19. Spare Slides

  20. Systems Capacity DOE Review of Tevatron Operations at FNAL

  21. Data Written per Day > 2.2 Petabytes of Data in the robots DOE Review of Tevatron Operations at FNAL

  22. Storage Systems and Tape Drives DOE Review of Tevatron Operations at FNAL

  23. Videoconferencing DOE Review of Tevatron Operations at FNAL

  24. Network Switches and Routers DOE Review of Tevatron Operations at FNAL

  25. Network – Static IP addresses DOE Review of Tevatron Operations at FNAL

  26. Wireless Access Points on Network DOE Review of Tevatron Operations at FNAL

  27. Some Common Services DOE Review of Tevatron Operations at FNAL

  28. Risks from 2004 review DOE Review of Tevatron Operations at FNAL

  29. Risks from 2004 review DOE Review of Tevatron Operations at FNAL

  30. DOE Review of Tevatron Operations at FNAL

  31. Risks, Challenges, Mitigations (1) • Reliance on Offsite resources/Grid Computing • Going well – but overall lack of funding from DOE and NSF for Grid Infrastructure and Network Infrastructure is a big risk • Need support for Middleware (Globus, Condor,VDT, iGOC, SRM), Grid Project continuations (PPDG, GriPhyN, iVDGL) and Open Science Grid development & operations • Must get ESnet upgrades (and LHCnet) !! • Mitigation: Funding agencies please work together on support building “cyber infrastructure” for global science - and build on the huge successes in Grid Computing so far DOE Review of Tevatron Operations at FNAL

  32. Risks, Challenges, Mitigations (2) • Computing Facility Infrastructure • planned, but may be a bit late • Distribution of data across 2 buildings to mitigate risk of catastrophe and also provide more reliable access to data • Ongoing –achieve partially in FY06 • Government (President/OMB/DOE) directives on managing Information Technology, Personal Identification, Foreign Visitors, Cyber Security, Asset management, and more… • Only mitigations are to work via SLCCC, Lab Directors, Ray Orbach, and Congress and to learn how to “get to green” in a way that does not compromise the science DOE Review of Tevatron Operations at FNAL

  33. Risks, Challenges, Mitigations (3) • Collaborations possibly becoming too weak to run detectors and experiment parts of computing. • Need the MOUs • Need to build a solid operational model that will work with fewer people. • LHC Physics Center at Fermilab should help to encourage continued participation in Run II experiments during the transition to the LHC era DOE Review of Tevatron Operations at FNAL

More Related