1 / 19

Jon Wakelin, Physics & ACRC

Learn about the server rooms, racks, and resources available at ACRC Bristol, including HPC, IBM, ClearSpeed, and storage facilities.

verville
Download Presentation

Jon Wakelin, Physics & ACRC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jon Wakelin, Physics & ACRC Bristol

  2. ACRC • Server Rooms • PTR – 48 APC water cooled racks (Hot aisle, cold aisle) • MVB – 12 APC water cooled racks (Hot aisle, cold aisle) • HPC • IBM, ClusterVision, ClearSpeed. • Storage • 2008-2011? • Petabyte scale facility • 6 Staff • 1 Director, 2 HPC Admins, 1 Research Facilitator • 1 Visualization Specialist, 1 e-Research Specialist • (1 Storage admin post?)

  3. ACRC Resources • Phase 1 – ~ March 07 • 384 core - AMD Opteron 2.6 Ghz dual-socket dual-core system, 8GB Mem. • MVB server room • CVOS and SL 4 on WN. GPFS, Torque/Maui, QLogic InfiniPath • Phase 2 - ~ May 08 • 3328 core - Intel Harpertown 2.8Ghz dual-socket, quad-core, 8GB Mem. • PTR server room - ~600 meter from MVB server room. • CVOS and SL? WN. GPFS, Torque/Moab, QLogic InfiniPath • Storage Project (2008 - 2011) • Initial purchase of additional 100 TB for PP and Climate Modelling groups • PTR server room • Operational by ~ sep 08. • GPFS will be installed on initial 100TB.

  4. ACRC Resources • 184 Registered Users • 54 Projects • 5 Faculties • Eng • Science • Social Science • Medicine & Dentistry • Medical & Vet.

  5. PP Resources • Initial LCG/PP setup • SE (DPM), CE and 16 core PP Cluster, MON and UI • CE for HPC (and SE and GridFTP servers for use with ACRC facilities) • HPC Phase 1 • PP have a 5% target fair-share, and up to 32 concurrent jobs • New CE, but uses existing SE - accessed via NAT (and slow). • Operational since end of Feb 08 • HPC Phase 2 • SL 5 will limit PP exploitation in short term. • Exploring Virtualization – but this is a medium- to long-term solution • PP to negotiate larger share of Phase 1 system to compensate • Storage • 50TB to arrive shortly, operational ~ Sep 08 • Additional networking necessary for short/medium-term access.

  6. Storage • Storage Cluster • Separate to HPC cluster • Will run GPFS • Being installed and configure ‘as we speak’ • Running a ‘test’ Storm SE • This is the second time • Due to changes in the underlying architecture • Passing simple SAM SE tests • But, now removed from BDII • Direct access between storage and WN • Through multi-cluster GPFS (rather than NAT) • Test and Real system may differ in the following ways… • Real system will have a separate GridFTP server • Possibly NFS export for Physics Cluster • 10Gb NICs (Myricom Myri10G PCI-Express)

  7. 5510-48 5510-48 5510-48 5510-48 5510-48 5510-48 5510-48 5510-48 5510-24 5510-24 8648 GTR 8683 8683 8683 8683 8683 5530 5530 5510-48 5510-48 5510-48 5510-48 5510-48 5510-48 5510-48 5510-48 HPC Phase 2 x3650 + Myri-10G x3650 + Myri-10G x3650 + Myri-10G x3650 + Myri-10G Storage PTR Server Room 5510-48 NB: All components are Nortel 5510-48 5510-48 HPC Phase 1 MVB Server Room

  8. 5510-48 5510-48 5510-48 5510-48 5510-48 5510-48 5510-48 5510-48 5510-24 5510-24 8648 GTR 8683 8683 8683 8683 8683 5530 5510-48 5510-48 5510-48 5510-48 5530 5510-48 5510-48 5510-48 5510-48 HPC Phase 2 x3650 + Myri-10G x3650 + Myri-10G x3650 + Myri-10G x3650 + Myri-10G Storage PTR Server Room 5530 5510-48 NB: All components are Nortel 5510-48 5510-48 HPC Phase 1 MVB Server Room

  9. SoC • Separation of Concerns • Storage/Compute managed independently of Grid Interfaces • Storage/Compute managed by dedicatedHPC experts. • Tap into storage/compute in the manner the ‘electricity grid’ analogy suggested • Provide PP with centrally managed compute and storage • Tarball WN install on HPC • Storm writing files to a remote GPFS mount (devs. and tests confirm this) • In theory this is a good idea - in practice it is hard to achieve • (Originally) implicit assumption that admin has full control over all components • Software now allows for (mainly) non-root installations • Depend on others for some aspects of support • Impact on turn-around times for resolving issues (SLAs?!?!!)

  10. General Issues • Limit the number of task that we pass on to HPC admins • Set up user, ‘admin’ accounts (sudo) and shared software areas • Torque - allow remote submission host (i.e. our CE) • Maui – ADMIN3 access for certain users (All users are A3 anyway) • NAT • Most other issues are solvable with less privileges • SSH Keys • RPM or rsync for Cert updates • WN tarball for software • Other issues • APEL accounting assumes ExecutingCE == SubmitHost (Bug report) • Work around for Maui client - key embedded in binaries!!! (now changed) • Home dir path has to be exactly the same on CE and Cluster. • Static route into HPC private network

  11. Q’s? • Any questions… • https://webpp.phy.bris.ac.uk/wiki/index.php/Grid/HPC_Documentation • http://www.datadirectnet.com/s2a-storage-systems/capacity-optimized-configuration • http://www.datadirectnet.com/direct-raid/direct-raid • hepix.caspur.it/spring2006/TALKS/6apr.dellagnello.gpfs.ppt

More Related