jon wakelin physics acrc l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Jon Wakelin, Physics & ACRC PowerPoint Presentation
Download Presentation
Jon Wakelin, Physics & ACRC

Loading in 2 Seconds...

play fullscreen
1 / 19

Jon Wakelin, Physics & ACRC - PowerPoint PPT Presentation


  • 361 Views
  • Uploaded on

Jon Wakelin, Physics & ACRC. Bristol. ACRC. Server Rooms PTR – 48 APC water cooled racks (Hot aisle, cold aisle) MVB – 12 APC water cooled racks (Hot aisle, cold aisle) HPC IBM, ClusterVision, ClearSpeed. Storage 2008-2011? Petabyte scale facility 6 Staff

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Jon Wakelin, Physics & ACRC' - Roberta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2
ACRC
  • Server Rooms
    • PTR – 48 APC water cooled racks (Hot aisle, cold aisle)
    • MVB – 12 APC water cooled racks (Hot aisle, cold aisle)
  • HPC
    • IBM, ClusterVision, ClearSpeed.
  • Storage
    • 2008-2011?
    • Petabyte scale facility
  • 6 Staff
    • 1 Director, 2 HPC Admins, 1 Research Facilitator
    • 1 Visualization Specialist, 1 e-Research Specialist
    • (1 Storage admin post?)
acrc resources
ACRC Resources
  • Phase 1 – ~ March 07
    • 384 core - AMD Opteron 2.6 Ghz dual-socket dual-core system, 8GB Mem.
    • MVB server room
    • CVOS and SL 4 on WN. GPFS, Torque/Maui, QLogic InfiniPath
  • Phase 2 - ~ May 08
    • 3328 core - Intel Harpertown 2.8Ghz dual-socket, quad-core, 8GB Mem.
    • PTR server room - ~600 meter from MVB server room.
    • CVOS and SL? WN. GPFS, Torque/Moab, QLogic InfiniPath
  • Storage Project (2008 - 2011)
    • Initial purchase of additional 100 TB for PP and Climate Modelling groups
    • PTR server room
    • Operational by ~ sep 08.
    • GPFS will be installed on initial 100TB.
acrc resources4
ACRC Resources
  • 184 Registered Users
  • 54 Projects
  • 5 Faculties
  • Eng
  • Science
  • Social Science
  • Medicine & Dentistry
  • Medical & Vet.
pp resources
PP Resources
  • Initial LCG/PP setup
    • SE (DPM), CE and 16 core PP Cluster, MON and UI
    • CE for HPC (and SE and GridFTP servers for use with ACRC facilities)
  • HPC Phase 1
    • PP have a 5% target fair-share, and up to 32 concurrent jobs
    • New CE, but uses existing SE - accessed via NAT (and slow).
    • Operational since end of Feb 08
  • HPC Phase 2
    • SL 5 will limit PP exploitation in short term.
    • Exploring Virtualization – but this is a medium- to long-term solution
    • PP to negotiate larger share of Phase 1 system to compensate
  • Storage
    • 50TB to arrive shortly, operational ~ Sep 08
    • Additional networking necessary for short/medium-term access.
storage
Storage
  • Storage Cluster
    • Separate to HPC cluster
    • Will run GPFS
    • Being installed and configure ‘as we speak’
  • Running a ‘test’ Storm SE
    • This is the second time
      • Due to changes in the underlying architecture
    • Passing simple SAM SE tests
      • But, now removed from BDII
    • Direct access between storage and WN
      • Through multi-cluster GPFS (rather than NAT)
  • Test and Real system may differ in the following ways…
    • Real system will have a separate GridFTP server
    • Possibly NFS export for Physics Cluster
    • 10Gb NICs (Myricom Myri10G PCI-Express)
slide14

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-24

5510-24

8648 GTR

8683

8683

8683

8683

8683

5530

5530

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

HPC Phase 2

x3650 + Myri-10G

x3650 + Myri-10G

x3650 + Myri-10G

x3650 + Myri-10G

Storage

PTR Server Room

5510-48

NB: All components are Nortel

5510-48

5510-48

HPC Phase 1

MVB Server Room

slide15

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-48

5510-24

5510-24

8648 GTR

8683

8683

8683

8683

8683

5530

5510-48

5510-48

5510-48

5510-48

5530

5510-48

5510-48

5510-48

5510-48

HPC Phase 2

x3650 + Myri-10G

x3650 + Myri-10G

x3650 + Myri-10G

x3650 + Myri-10G

Storage

PTR Server Room

5530

5510-48

NB: All components are Nortel

5510-48

5510-48

HPC Phase 1

MVB Server Room

slide17
SoC
  • Separation of Concerns
    • Storage/Compute managed independently of Grid Interfaces
    • Storage/Compute managed by dedicatedHPC experts.
    • Tap into storage/compute in the manner the ‘electricity grid’ analogy suggested
  • Provide PP with centrally managed compute and storage
    • Tarball WN install on HPC
    • Storm writing files to a remote GPFS mount (devs. and tests confirm this)
  • In theory this is a good idea - in practice it is hard to achieve
    • (Originally) implicit assumption that admin has full control over all components
      • Software now allows for (mainly) non-root installations
    • Depend on others for some aspects of support
      • Impact on turn-around times for resolving issues (SLAs?!?!!)
general issues
General Issues
  • Limit the number of task that we pass on to HPC admins
    • Set up user, ‘admin’ accounts (sudo) and shared software areas
    • Torque - allow remote submission host (i.e. our CE)
    • Maui – ADMIN3 access for certain users (All users are A3 anyway)
    • NAT
  • Most other issues are solvable with less privileges
    • SSH Keys
    • RPM or rsync for Cert updates
    • WN tarball for software
  • Other issues
    • APEL accounting assumes ExecutingCE == SubmitHost (Bug report)
    • Work around for Maui client - key embedded in binaries!!! (now changed)
    • Home dir path has to be exactly the same on CE and Cluster.
    • Static route into HPC private network
slide19
Q’s?
  • Any questions…
  • https://webpp.phy.bris.ac.uk/wiki/index.php/Grid/HPC_Documentation
  • http://www.datadirectnet.com/s2a-storage-systems/capacity-optimized-configuration
  • http://www.datadirectnet.com/direct-raid/direct-raid
  • hepix.caspur.it/spring2006/TALKS/6apr.dellagnello.gpfs.ppt