1 / 17

Southgrid Status

Southgrid Status. Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow. RAL PPD. Existing 30 Xeon cpus, and 6.5 TB storage supplemented by upgrade mid 2006 RALPPD installed large upgrade 200 (Opteron 270) cpu cores equiv. to and extra 260 KSI2k plus 86TB of storage.

alicia
Download Presentation

Southgrid Status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Southgrid Status Pete Gronbech: 21st March 2007 GridPP 18 Glasgow

  2. RAL PPD • Existing 30 Xeon cpus, and 6.5 TB storage supplemented by upgrade mid 2006 • RALPPD installed large upgrade 200 (Opteron 270) cpu cores equiv. to and extra 260 KSI2k plus 86TB of storage. • The 50TB which was loaned to RAL Tier 1, and is now being returned. • 10Gb/S Connection to RAL Backbone • RAL Currently connected at 1GB/s to TVM • Will be connected at 10Gb/s to SJ5 Backbone by 01/04/2007

  3. RAL PPD (2) • Supports 22 Different VOs of which 18 have run jobs in the last year. • 1,000,000 kSI2k Hours delivered in last 12 months • 2007 upgrade ordered Disk and CPU: • 13 x 6TB SATA Disk servers, 3Ware RAID controllers 14 x 500GB WD disks • 32 x Dual Intel 5150 Dual Core CPU Nodes with 8GB RAM • Orders Placed, Delivery expected in the next 7 days • Will be installed in the Atlas Centre, due to power/cooling issues in R1

  4. Status at Cambridge • Currently glite 3 on SL3 • CPUs: 32 2.8GHz Xeon • 3 TB Storage • DPM enabled Oct 05 • Upgrade arrived Christmas 2006 32 Intel ‘ Woodcrest’ based servers, giving 128 cpu cores equiv. to approx 358 KSI2k. • Local computer room upgraded. • Storage upgrade to 40-60TB expected this summer. • Condor version 6.8.4 is being used but the latest LCG updates have a dependency for condor-6.7.10-1. This development release should not be used in a production environment. LCG/glite should not be requiring this release.

  5. Cambridge (2) • CAMONT VO supported at Cambridge, Oxford and Birmingham. Job submission by Karl Harrison and David Sinclair • LHCb on Windows project (Ying Ying Li) • Code ported to windows • HEP 4 node cluster • MS Research Lab 4 node cluster (Windows compute cluster) • Code running on a server at Oxford, possibly expansion on OERC windows cluster • Possible Bristol nodes soon

  6. Status at Bristol • Status • All nodes running SL3.0.5 with glite 3 • DPM enabled Oct 05, LFC installed Jan 06 • Existing resources • GridPP nodes plus local cluster nodes used to bring site on line. Local cluster being integrated. • New resources • 10TB of storage coming on line soon. • Bristol expect to have a percentage of the new Campus cluster from early 2007. Includes CPU, high quality and scratch disk resources • Jon Wakelin is working 50% on GPFS and Storm. • IBM loan kit

  7. Status at Birmingham • Currently SL3 with glite 3 • CPUs: 28 2.0GHz Xeon (+98 800MHz ) • 1.9TB DPM, being replaced by new 10TB array • Babar Farm starting to become unreliable due to many disk and PSU failures. • Run Pre Production Service which is used for testing new versions of the middleware. • Birmingham will have a percentage of the new Campus Cluster due May/June 2007. First phase: 256 nodes each with two dual core opteron CPUs.

  8. Status at Oxford • Currently glite 3 on SL305 • CPUs: 80 2.8 GHz • Compute Element • 37 Worker Nodes, 74 Jobs Slots, 67 KSI2K • 37 Dual 2.8GHz P4 Xeon, 2GB RAM • DPM SRM Storage Element • 2 Disks servers 3.2TB Disk Space • 1.6 TB DPM server – second 1.6TB DPM disk pool node. Bug in DPM stopped load balancing across pools, will be fixed with the latest glite update. • Logical File Catalogue • Mon and UI nodes • GridMon Network Monitor • 1Gb/s Connectivity to the Oxford Backbone • Oxford currently connected at 1Gb/s to TVM • Submission from the Oxford CampusGrid via the NGS VO is possible.

  9. Usage • Oxford supports 20 VOs. 17 of which have run jobs in the last year. • Most active VOs are LHCb (38.5%), Atlas (21.3%) and Biomed (21%). • 300,000 kSI2k hours delivered in the last 12 months.

  10. New Computer Room The New Computer room being built at Begbroke Science Park jointly for the Oxford Super Computer and the Physics department, will provide space for 55 (11KW) computer racks. 22 of which will be for Physics. Up to a third of these can be used for the Tier 2 centre. Disk and CPU (Planned purchase for Summer 07) 32 * Dual Intel 5150 Dual Core CPU Nodes with 8GB RAM giving 353kSI2k 10 * 12TB SATA Disks servers giving 105 TB usable (after RAID 6)Quad core CPU’s will be benchmarked , both for SPEC rates and power consumption. Newer 1TB disks will be more common place by the Summer.

  11. Oxford DWB Computer room Local Physics department Infrastructure computer room (100KW) has been agreed. Should be ready in May/June 07. This will relieve local computer rooms and possible house T2 equipment until the Begbroke room is ready. Racks that are currently in unsuitable locations can be re housed.

  12. Other Southgrid sites • Other groups within the Southgrid EGEE area are; EFDA-JET with 40 cpus up and running • The Advanced Computing and Emerging Technologies (ACET) Centre, School of Systems Engineering, University ofReading started setting up their cluster in Dec 06.

  13. SouthGrid CPU delivered SouthGrid provided 1.4MKSI2K hours in the year March 06 -07

  14. SouthGrid VO/Site Shares

  15. Steve Lloyd Tests21.3.07

  16. Site Monitoring • Grid wide provided monitoring • GSTAT • SAM • GOC Accounting • Steve Lloyds Atlas test page • Local Site Monitoring • ganglia • pakiti • torque/maui monitoring CLIs • Investigating MonAMI • Developing • Nagios; RAL PPD have developed many plugins, Other SouthGrid sites are just setting up

  17. Summary • SouthGrid continues to run well, and its resources are set to expand throughout this year. • Birmingham new University Cluster will be ready in the Summer. • Bristol small cluster is stable, new University cluster is starting to come on line. • Cambridge cluster upgraded as part of the CamGrid SRIF3 bid. • Oxford will be able to expand resources this Summer when the new computer room is built. • RAL PPD has expanded last year and this year, way above what was originally promised in the MoU.

More Related