1 / 17

RALPP Site Report

RALPP Site Report. HEP Sys Man, 11 th May 2012 Rob Harper. My talk will be. Where we’re at now Our new stuff, including GridPP purchases DRI networking kit Benchmarking and hyperthreads Virtual machine infrastructure Managing configuration and stuff: cfEngine vs Puppet

Download Presentation

RALPP Site Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RALPP Site Report HEP Sys Man, 11th May 2012 Rob Harper

  2. My talk will be... • Where we’re at now • Our new stuff, including • GridPP purchases • DRI networking kit • Benchmarking and hyperthreads • Virtual machine infrastructure • Managing configuration and stuff: cfEnginevs Puppet • Future stuff

  3. RALPP For Dummies • Part of SouthGrid • Staff • Chris Brew (part) • Rob Harper (part) • One cluster serving Tier 2 (85%) and Tier 3 (15%), managed by Torque/Maui • dCache storage

  4. RALPP CPU

  5. RALPP CPU • Cluster is currently nominally: • 2,872 Job slots • 26,409 HS06 • Where available, hyperthreads used to get 150% of physical cores

  6. RALPP Storage • TB

  7. RALPP Storage • 1,060 TB in production • Soon to be 1,260 TB

  8. New Stuff: GridPP Purchases • CPU: • 9 * Viglen/Supermicro Twin2 • Intel E5645 based • 48 GB / node • Using hyperthreads • => 648 job slots, 6208 HS06 • Disk: • 5 * Viglen/Supermicro 24 bay storage nodes • => 200 TB of disk pool

  9. New Stuff: Networking • DRI money bought us: • 5 * Force10 s4810 switches • A heap of 10Gb NICs for older disk pool nodes • A heap of 10Gb cables • Coming soon: a much reconfigured network...

  10. New Network Layout

  11. Benchmarking & Hyperthreads • We ran HS06 benchmark on a heap of nodes with varying numbers of concurrent benchmark jobs • Going past # of physical cores did give us some gains

  12. Benchmarking & Hyperthreads • So we committed 1.5 * physical cores as job slots for some nodes and ran real jobs • No significant drop in efficiency • More work done • Many details on SouthGrid blog at http://bit.ly/Iu7BfS

  13. Virtual Machines • Current set-up: • Xen VMs spread between a couple of servers • Local storage, nothing clever • Currently in test: • Cluster running HyperV • Yes, we’ll be running Linux VMs on Windows • EqualLogic storage • iSCSI • Mirroring, etc.

  14. Configuration Management • Already much discussed yesterday, but here’s our perspective... • We currently rely on cfEngine v2 • This is not supported natively on SL6 (or at all) • Main options seem to be: • Crowbar in legacy cfEngine • cfEngine v3 – will need configs rewritten • Switch to Puppet – will need configs rewritten

  15. Puppet • Puppet seems to be a strong choice • Particularly as other Tier 2s are coming to the same decision • Not got far yet • We have a working Puppet Master with some basic manifests set up • We have an SL6 client for test purposes • Planning to use Puppet for SL6 hosts as we set them up – leaving SL5 kit on cfEngine

  16. Puppet • Our cfEngineconfig relies massively on EditFiles functionality • Puppet does not have this • Can run scripts to do edits • Can use modules (eg. iptables) that do the work for you • We need to learn to think in a different way to take advantage of Puppet

  17. Things to come... • Getting network configuration updated • Start deploying VMs in HyperV • Getting Puppet configuration management running properly • Start using SL6 as a standard install for services where we have no reason not to • Improved monitoring

More Related