150 likes | 280 Views
The Southgrid Status Report for September 2004 provides insights into the progress and operational status of member institutions including Oxford, RAL PPD, Cambridge, Birmingham, Bristol, and Warwick. Regular meetings of the Tier 2 Management Board occur every three months, addressing concerns over security policies and MOU authorizations. Each institution showcases their computational resources, current software status, and ongoing issues surrounding installations and resource sharing. Future challenges are also discussed, especially regarding scaling up computing power.
E N D
Southgrid Status Report Rhys Newman: September 2004 GridPP 11 - Liverpool
Southgrid Member Institutions • Oxford • RAL PPD • Cambridge • Birmingham • Bristol • Warwick
Tier 2 Management Board • Tier 2 Board meets regularly every 3 months. • Where possible this is a face to face meeting, although a couple of people “phone in”. • MOU Status is still ongoing. • The most difficult tier 2 to organise as it has the most institutes. • Many concerns about imposing security policy on to the institute • Confusion as to who in each site is authorised to sign the MOU. • Final signatures are being collected as I speak.
Status at Warwick • A recent addition to Southgrid. • Third line institute – no resources as yet but remain interested in being involved in the future. • Will not receive GridPP resources and so does not need to sign the MOU yet.
Operational Status • RAL PPD • Cambridge • Bristol • Birmingham • Oxford
Status at RAL PPD • Always on the leading edge of software deployment (co-located with RAL Tier 1) • Currently (10 Sept) up to LCG 2.2 • CPUs: 24 2.4 GHz, 18 2.8GHz • 100% Dedicated to LCG • 0.5 TB Storage • 100% Dedicated to LCG
Status at Cambridge • Constantly the first institute to keep up with LCG releases. • Currently LCG 2.1.1 (since date of release), will upgrade by October. • CPUs: 32 2.8GHz – increase to 40 soon. • 100% Dedicated to LCG • 3 TB Storage • 100% Dedicated to LCG
Status at Bristol • Limited involvement for last 6 months due to manpower shortage. • Current plans to switch BaBar farm to LCG by October. • 1.25 FTE computer support to be filled soon and should improve the situation (Bristol Initiative not GridPP) • CPUs: 80 866MHz PIII (Planned BaBar) • Shared with LHC under an LCG install. • 2 TB Storage (Planned) • Shared with LHC under an LCG install. • Possible new computing centre (>500 CPUs) still ongoing. • Possible new post still ongoing.
Status at Birmingham • Second line institute, reliably up to date with software within about 6 weeks of release. • Currently LCG 2.2 (since mid August). • Southgrid’s “Hardware Support Post” to be allocated here to assist. • CPUs: 22 2.0GHz Xenon (+48 soon) • 100% LCG • 2 TB Storage awaiting “Front End Machines” • 100% LCG.
Status at Oxford • Second line institute, have only recently come online. Until May had limited resource. • Currently LCG 2.1.1 (since early August). • Hosted LCG2 Administrator’s Course which impacted installation timeline. • CPUs: 80 2.8 GHz • 100% LCG • 1.5 TB Storage – upgrade to 3TB planned • 100% LCG.
Resource Summary • CPU (3GHz equiv) • 155.2 Total • Storage (TB) • 7 TB Total
LCG2 Administrator’s Course • Main activity at Oxford for the weeks leading up to July. • Received very well – lack of machines was identified as a problem, even though we used 20 servers! • A good measure of the complexity: • An expert could do a LCG install in 1 day • A novice could do it with expert help in 3 days. • A novice alone could take weeks! • A lot of interest in a repeat, especially when the 8.5 “Hardware Support” posts are filled (suggestions welcome).
Ongoing Issues • Complexity of the installation. Can’t compare with “Google Compute” – is winning a PR exercise useful? • Difficulty sharing resources – almost all of those listed are 100% LCG due to difficult sharing issues. • How will we manage clusters without LCFGng? Quattor has a learning curve (uses a new language) – should we all get training?
Future Issues • We need 100 000 1GHz machines “… to scale up the computing power available by a factor of ten…” – Tony Doyle (GridPP summary of All Hands meeting). • What are we learning now? The gLite (aka EGEE1) may be completely different? • Can’t we get some cycle stealing? 20000 “decent” machines in Oxford University alone!
LHC At Home!! (Thanks Mark) LHC at Home: http://lhcathome.cern.ch • Started 1st September • Still beta. • 1004 Computers already • How can we leverage this???