1 / 12

Tier1A Status

Tier1A Status. Martin Bly 28 April 2003. CPU Farm. Older hardware: 108 dual processors (450, 600 and 1GHz) 156 dual processor 1400MHz PIII Recent delivery: 80 dual 2.66GHz P4 Xeon 533MHz FSB, 2GB memory Next delivery expected in the summer. Operating Systems. Operating Systems:

egil
Download Presentation

Tier1A Status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tier1A Status Martin Bly 28 April 2003

  2. CPU Farm • Older hardware: • 108 dual processors (450, 600 and 1GHz) • 156 dual processor 1400MHz PIII • Recent delivery: • 80 dual 2.66GHz P4 Xeon • 533MHz FSB, 2GB memory • Next delivery expected in the summer

  3. Operating Systems • Operating Systems: • Redhat 6.2 service will close in May • Redhat 7.2 service has been in production for Babar for 6 months. • New Redhat 7.3 service now available for LHC/other experiments • Increasing demands for security updates becoming problematic.

  4. Disk Farm (last Year) • Last year – 26 servers, each with 2 external RAID arrays - 1.7TB disk per server: • Excellent performance, well balanced system • Problems with a bad batch of Maxtor drives – many failures and high error rate – all 620 drives now replaced by Maxtor. • Still outstanding problems with Accusys controller failing to eject bad drives from RAID set.

  5. Disk Farm (this year) • Recent upgrade to disk farm. • 11 dual P4 servers (with PCIx), each with 2 Infortrend IFT-6300 arrays • 12 Maxtor 200GB Diamondmax Plus 9 drives per array. • Not yet in production – but a few snags: • Original tendered Maxtor: Maxline Plus II drive was found not to exist. • Infortrend array has 2TB limit per RAID set – some (10%) wasted space! • Nick White (N.G.H.White@rl.ac.uk) for more info

  6. New Projects • Basic fabric performance monitoring (ganglia) • Resource CPU accounting (based on PBS accounts/mysql) • New CA in production • New batch scheduler (MAUI) • Deploy new helpdesk (May)

  7. Ganglia Monitoring • Urgently needed live performance and utilisation monitoring • RAL Ganglia Monitoring (live) • RAL Ganglia Monitoring (Static) • Scalable solution based on multicast • Very rapidly deployable - reasonable support on all Tier1A Hardware • See: http://ganglia.sourceforge.net/

  8. PBS Accounting Software • Need to keep track of system CPU and disk usage. • Home grown PBS accounting package (Derek Ross): • Upload PBS and disk stats into MYSQL • Process with perl DBI script • Serve via Apache • http://www.gridpp.rl.ac.uk/stats • Contact Derek (D.Ross@rl.ac.uk) for more info.

  9. MAUI/PBS • Maui scheduler has been in production for last 3 months. • Allows extremely flexible scheduling with many features. But …. • Not all of it works – we have done much work with developers for fixes. • Major problem – MAUI schedules on wall clock time – not CPU time. Had to bodge it!!

  10. New Helpdesk Software • Old helpdesk mail based/unfriendly. • With additional staff, urgently need to deploy new solution. • Expect new system to be based on free software – probably Request Tracker • Hope that deployed system will also meet needs of Testbed and may also satisfy Tier 2 sites. • Expect deployment by end of May. • http://requestracker.gridpp.rl.ac.uk/ (Static)

  11. Outstanding Issues/worries • We have to run many distinct services. For example, FERMI Linux, RH 6.2/7.2/7.3, EDG testbeds, LCG … • Farm management is getting very complex. We need better tools and automation. • Security Is becoming a big concern again.

More Related