1 / 35

FIO Services and Projects Post-C5 February 22 nd 2002 Tony.Cass@ CERN .ch

FIO Services and Projects Post-C5 February 22 nd 2002 Tony.Cass@ CERN .ch. Headline Services. Physics Services P=4FTE; M=800K; I=1,300K Computing Hardware Supply P=6.35FTE; M=15K; I= 200K (funded by sales uplift) Computer Centre Operations P=3.7FTE; M=100K; I= 635K Printing

Download Presentation

FIO Services and Projects Post-C5 February 22 nd 2002 Tony.Cass@ CERN .ch

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FIO Services and ProjectsPost-C5February 22nd 2002Tony.Cass@CERN.ch

  2. Headline Services • Physics Services • P=4FTE; M=800K; I=1,300K • Computing Hardware Supply • P=6.35FTE; M=15K; I=200K (funded by sales uplift) • Computer Centre Operations • P=3.7FTE; M=100K; I= 635K • Printing • P=3.5FTE; M=20K; I=175K • Remedy Support • P=1.65FTE; M=215K; I=0 • Mac Support • P=1.25FTE; M=5K; I=50K

  3. Projects • WP4 • Contribution to WP4 of EU DataGrid • P=0.5FTE • LCG – Implementation of WP4 tools • Active progress towards day-to-day management of large farms. • P=LCG allocation (2FTE?) • Computer Centre Supervision (PVSS) • Test PVSS for CC monitoring. Prototypes in Q1, Q2 and Q4. • P=2.7FTE, M=25K • B513 Refurbishment • Adapt B513 for 2006 needs. Remodel vault in 2002. • P=0.3FTE, M=1,700K

  4. Macintosh Support • Support for MacOS and applications, plus backup services. • We only support MacOS 9, but this is out of date. No formal MacOS X support, but software is downloaded centrally for general efficiency. • Staffing level for Macintosh support is declining; now at 1.25FTE plus 50% service contract. • Plus 0.25FTE for CARA—used by some PC people, not just Mac users. • Key work area in 2002 is removal of AppleTalk access to printers. • Migrate users to lpr client (already used by Frank and Ludwig). • Streamlines general printing service—and is another move towards an IP only network. (LocalTalk removed last year.)

  5. Printing Support • Overall service responsibility is with FIO, but clearly much valuable assistance from • PS for OS and Software support for central servers • IS for Print Wizard • General aim is to have happy and appreciative users. • Install printers, maintain replace toner as necessary, … • Seems to be working: spontaneous outburst of public appreciation during January’s Desktop Forum. • Promote and support projector installation in order to reduce (expensive) colour printing. • Working (if slowly) to improve remote monitoring of printers—enable pre-emptive action. • Or (say it softly) a “Managed Print Service”

  6. Computing Hardware Supply • Aim to supply standardised hardware (desktop PCs, portables, printers, Macs) to users rapidly and efficiently. • Migration to use of CERN’s BAAN package is almost complete. Increases efficiency through • use of a standard stock management application, • end-user purchases are by Material Request not TID, • streamlined ordering procedures. • Could we ever move to a “Managed Desktop” service rather than shifting boxes? • Idea is appreciated outside IT but needs capital. • Service relies on the Desktop Support Contract… • Service also handles CPU and Disk Servers.

  7. Remedy Support • “Remedy” was introduced to meet the needs of the Desktop Support and Serco contracts for workflow management—problem and activity tracking. • FIO supports two Remedy Applications • PRMS for general problem tracking (the “3 level” model for support) • Used for Desktop Contract (including the helpdesk) and within IT • ITCM tracks direct CERNContractor activity requests for Serco and Network Management Contracts. • Do we need two different applications? Yes and No. • Two distinct needs, but could be merged. • However, this isn’t a priority and effort is scarce. • And don’t even ask about consolidated Remedy support across CERN!

  8. PRMS and ITCM Developments • PRMS • Continuing focus over past couple of years has been to consolidate the basic service—integrate the many little changes that have been made to meet punctual needs. • Outstanding requests for additional functionality include • An improved “SLA Trigger mechanism”—defining how and when (and to whom) to raise alarms if tickets are left untreated too long. • A “service logbook” to track interventions on a single system • Various small items including a Palm interface • ITCM • No firm developments planned, but many suggestions are floating around. • Overall, we need to migrate to Remedy 5… • … and available effort is limited.

  9. Computing Hardware Supply • Aim to supply standardised hardware (desktop PCs, portables, printers, Macs) to users rapidly and efficiently. • Migration to use of CERN’s BAAN package is almost complete. Increases efficiency through • use of standard stock management application, • end-user purchases are by Material Request not TID, • streamlined ordering procedures. • Could we ever move to a “Managed Desktop” service rather than moving boxes? • Idea is appreciated outside IT but needs capital. • Service relies on the Desktop Support Contract. • Service also handles CPU and Disk Servers…

  10. Physics Services • Last year’s reorganisation split “PDP” Services across • ADC: services to “push the envelope” • PS: Solaris and engineering services • FIO: Everything else.

  11. Physics Services • Last year’s reorganisation split “PDP” Services across • ADC: services to “push the envelope” • PS: Solaris and engineering services • FIO: Everything else. • So, what is “Everything else”? • lxplus: main interactive service • lxbatch: ditto for batch • lxshare: time shared lxbatch extension • RISC remnants—mainly for LEP experiments • Much general support infrastructure • First line interface for physics support

  12. Physics Service Concerns • RISC Reduction

  13. Physics Service Concerns • RISC Reduction • RISC Reduction

  14. Physics Service Concerns • RISC Reduction • RISC Reduction • RISC Reduction

  15. Physics Service Concerns • RISC Reduction • RISC Reduction • RISC Reduction • Managing Large Linux Clusters • Fabric Management

  16. Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • Logistics • State Management

  17. Fabric Management Concerns • Software Installation — OS and Applications • We need rapid and rock-solid system and application installation tools. • Development discussions are part of EDG/WP4 to which we contribute. • Full scale testing and deployment as part of LCG project. • (Performance and Exception…) Monitoring • Configuration Management • Logistics • State Management

  18. Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • The Division is now committed to testing PVSS as a monitoring and control framework for the computer centre. • Overall architecture remains as decided within PEM and WP4 • New “Computer Centre Supervision” Project has 3 key milestones for 2002 • “Brainless” rework of PEM monitoring with PVSS • 900 systems now being monitored. Post-C5 presentation in March/April. • Intelligent rework for Q2 then wider system for Q4 • Configuration Management • Logistics • State Management

  19. Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • How do systems know what they should install? • How does the monitoring system know what a system should be running? • An overall configuration database is required. • Logistics • State Management

  20. Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • Logistics • How do we keep track of 20,000+ objects? • We can’t manage 5,000 objects today. • Where are they all? (Feb 9th: Some systems couldn’t be found) • Which are in production? New? Obsolete? • And which are temporarily out of service? • How do physical and logical arrangements relate? • Where is this service located? • What happens if this normabarre/PDU fails? • State Management

  21. Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • Logistics • State Management • What needs to be done to move this box • from receptionto a final locationto be part of a given service? • What procedures should be followed if a box fails (after automatic recovery actions, naturally!) • This is workflow management • that should integrate with overall workflow management.

  22. Fabric Management Concerns • Software Installation — OS and Applications • (Performance and Exception…) Monitoring • Configuration Management • Logistics • State Management • Work on these items is the FIO contribution to the Fabric Management part of the LHC Computing Grid Project. • Detailed activities and priorities will be set by LCG • They are providing the additional manpower! • Planning document being prepared now based on input from FIO and ADC.

  23. … And where do the clusters go? • Estimated Space and Power Requirements for LHC Computing • 2,500m2 — increase of ~1,000m2 • 2MW — nominal increase of 800kW (1.2MW above current load) • Conversion of Tape Vault to Machine Room area agreed at post-C5 in June 2001. • Best option for space provision • Initial cost estimate of 1,300-1,400KCHF

  24. Vault Conversion • We are converting the tape vault to a Machine Room area of ~1,200m2 with • False floor, finished height of 70cm • 6  “In room” air conditioning units. • Total cooling capacity: 500kW • 5  130kW electrical cabinets • Double power input • 5 or 6 20kW normabarres/PDU • 3-4 racks of 44PCs/normabarre • 2  130kW cabinets supplying “critical equipment area” • Critical equipment can be connected to each PDU • Two zones, one for network equipment, one for other critical services. • Smoke detection, but no fire extinction

  25. The Vault Not So Long Ago

  26. An Almost Empty Vault

  27. One air conditioning room…

  28. …is gone.

  29. The next…

  30. …is on the way out

  31. The Next Steps • Create a new Substation for B513 • To power 2MW of computing equipment plus air-conditioning and ancillary loads. • Included in the site-wide 18kV loop—more redundancy. • Favoured location: • Underground, but5 transformerson top. • Refurbish the main Computer Room once enough equipment has moved to the vault.

  32. View from B31 Today

  33. View from B31 with Substation

  34. Looking Towards B31

  35. Summary • Six Services • Physics Services Remedy Support • Computing Hardware Supply Printing • Computer Centre Operations Macintosh Support • Service Developments to • Follow natural developments • Remedy 5, LSF 4.2, RedHat 7.2 • Streamline provision of existing services • To reduce “P”—c.f. BAAN for Hardware Supply • To manage more—c.f. developments for Physics Services • Four Projects • Computer Centre Supervision B513 Refurbishment • Fabric Management Development (EDG) • Fabric Management Implementation (LCG)

More Related