Jon wakelin physics acrc
Download
1 / 19

Jon Wakelin - PowerPoint PPT Presentation


  • 348 Views
  • Uploaded on

Jon Wakelin, Physics & ACRC. Bristol. ACRC. Server Rooms PTR – 48 APC water cooled racks (Hot aisle, cold aisle) MVB – 12 APC water cooled racks (Hot aisle, cold aisle) HPC IBM, ClusterVision, ClearSpeed. Storage 2008-2011? Petabyte scale facility 6 Staff

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Jon Wakelin' - Roberta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Slide2 l.jpg
ACRC

  • Server Rooms

    • PTR – 48 APC water cooled racks (Hot aisle, cold aisle)

    • MVB – 12 APC water cooled racks (Hot aisle, cold aisle)

  • HPC

    • IBM, ClusterVision, ClearSpeed.

  • Storage

    • 2008-2011?

    • Petabyte scale facility

  • 6 Staff

    • 1 Director, 2 HPC Admins, 1 Research Facilitator

    • 1 Visualization Specialist, 1 e-Research Specialist

    • (1 Storage admin post?)


Acrc resources l.jpg
ACRC Resources

  • Phase 1 – ~ March 07

    • 384 core - AMD Opteron 2.6 Ghz dual-socket dual-core system, 8GB Mem.

    • MVB server room

    • CVOS and SL 4 on WN. GPFS, Torque/Maui, QLogic InfiniPath

  • Phase 2 - ~ May 08

    • 3328 core - Intel Harpertown 2.8Ghz dual-socket, quad-core, 8GB Mem.

    • PTR server room - ~600 meter from MVB server room.

    • CVOS and SL? WN. GPFS, Torque/Moab, QLogic InfiniPath

  • Storage Project (2008 - 2011)

    • Initial purchase of additional 100 TB for PP and Climate Modelling groups

    • PTR server room

    • Operational by ~ sep 08.

    • GPFS will be installed on initial 100TB.


Acrc resources4 l.jpg
ACRC Resources

  • 184 Registered Users

  • 54 Projects

  • 5 Faculties

  • Eng

  • Science

  • Social Science

  • Medicine & Dentistry

  • Medical & Vet.


Pp resources l.jpg
PP Resources

  • Initial LCG/PP setup

    • SE (DPM), CE and 16 core PP Cluster, MON and UI

    • CE for HPC (and SE and GridFTP servers for use with ACRC facilities)

  • HPC Phase 1

    • PP have a 5% target fair-share, and up to 32 concurrent jobs

    • New CE, but uses existing SE - accessed via NAT (and slow).

    • Operational since end of Feb 08

  • HPC Phase 2

    • SL 5 will limit PP exploitation in short term.

    • Exploring Virtualization – but this is a medium- to long-term solution

    • PP to negotiate larger share of Phase 1 system to compensate

  • Storage

    • 50TB to arrive shortly, operational ~ Sep 08

    • Additional networking necessary for short/medium-term access.


Storage l.jpg
Storage

  • Storage Cluster

    • Separate to HPC cluster

    • Will run GPFS

    • Being installed and configure ‘as we speak’

  • Running a ‘test’ Storm SE

    • This is the second time

      • Due to changes in the underlying architecture

    • Passing simple SAM SE tests

      • But, now removed from BDII

    • Direct access between storage and WN

      • Through multi-cluster GPFS (rather than NAT)

  • Test and Real system may differ in the following ways…

    • Real system will have a separate GridFTP server

    • Possibly NFS export for Physics Cluster

    • 10Gb NICs (Myricom Myri10G PCI-Express)


Slide14 l.jpg

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-24

5510-24

8648 GTR

8683

8683

8683

8683

8683

5530

5530

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

HPC Phase 2

x3650 + Myri-10G

x3650 + Myri-10G

x3650 + Myri-10G

x3650 + Myri-10G

Storage

PTR Server Room

5510-48

NB: All components are Nortel

5510-48

5510-48

HPC Phase 1

MVB Server Room


Slide15 l.jpg

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-24

5510-24

8648 GTR

8683

8683

8683

8683

8683

5530

5510-48

5510-48

5510-48

5510-48

5530

5510-48

5510-48

5510-48

5510-48

HPC Phase 2

x3650 + Myri-10G

x3650 + Myri-10G

x3650 + Myri-10G

x3650 + Myri-10G

Storage

PTR Server Room

5530

5510-48

NB: All components are Nortel

5510-48

5510-48

HPC Phase 1

MVB Server Room


Slide17 l.jpg
SoC

  • Separation of Concerns

    • Storage/Compute managed independently of Grid Interfaces

    • Storage/Compute managed by dedicatedHPC experts.

    • Tap into storage/compute in the manner the ‘electricity grid’ analogy suggested

  • Provide PP with centrally managed compute and storage

    • Tarball WN install on HPC

    • Storm writing files to a remote GPFS mount (devs. and tests confirm this)

  • In theory this is a good idea - in practice it is hard to achieve

    • (Originally) implicit assumption that admin has full control over all components

      • Software now allows for (mainly) non-root installations

    • Depend on others for some aspects of support

      • Impact on turn-around times for resolving issues (SLAs?!?!!)


General issues l.jpg
General Issues

  • Limit the number of task that we pass on to HPC admins

    • Set up user, ‘admin’ accounts (sudo) and shared software areas

    • Torque - allow remote submission host (i.e. our CE)

    • Maui – ADMIN3 access for certain users (All users are A3 anyway)

    • NAT

  • Most other issues are solvable with less privileges

    • SSH Keys

    • RPM or rsync for Cert updates

    • WN tarball for software

  • Other issues

    • APEL accounting assumes ExecutingCE == SubmitHost (Bug report)

    • Work around for Maui client - key embedded in binaries!!! (now changed)

    • Home dir path has to be exactly the same on CE and Cluster.

    • Static route into HPC private network


Slide19 l.jpg
Q’s?

  • Any questions…

  • https://webpp.phy.bris.ac.uk/wiki/index.php/Grid/HPC_Documentation

  • http://www.datadirectnet.com/s2a-storage-systems/capacity-optimized-configuration

  • http://www.datadirectnet.com/direct-raid/direct-raid

  • hepix.caspur.it/spring2006/TALKS/6apr.dellagnello.gpfs.ppt


ad