1 / 27

JASMIN/CEMS and EMERALD

JASMIN/CEMS and EMERALD. Scientific Computing Developments at STFC Peter Oliver, Martin Bly Scientific Computing Department Oct 2012. Outline. STFC Compute and Data National and International Services Summary. Daresbury Laboratory Daresbury Science and Innovation Campus

step
Download Presentation

JASMIN/CEMS and EMERALD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JASMIN/CEMS and EMERALD Scientific Computing Developments at STFC Peter Oliver, Martin Bly Scientific Computing Department Oct 2012

  2. Outline • STFC • Compute and Data • National and International Services • Summary JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  3. Daresbury Laboratory Daresbury Science and Innovation Campus Warrington, Cheshire UK Astronomy Technology Centre Edinburgh Polaris House Swindon, Wiltshire Rutherford Appleton Laboratory Harwell Oxford Science and Innovation Campus Chilbolton Observatory Stockbridge, Hampshire Joint Astronomy Centre Hawaii Isaac Newton Group of Telescopes La Palma JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  4. What we do…. • The nuts and bolts that make it work • enable scientists, engineers and researcher to develop world class science, innovation and skills JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  5. SCARF • Providing Resources for STFC Facilities, Staff and their collaborators • ~2700 Cores • Infiniband • Panasasfilesystem • Managed as one entity • ~50 peer reviewed publications/year • Additional capacity per year for general use • Facilities such as CLF add capacity using their own funds • National Grid Service partner • Local access using Myproxy-SSO • Users use federal id and password to login • UK e-Science Certificate access JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  6. NSCCS (National Service Computational Chemistry Software) • Providing National and International Compute, Training and support • EPSRC Mid-Range Service • SGI Altix UV SMP system, 512 CPUs, 2TB shared memory • Large memory SMP chosen over a traditional cluster as this best suites the Computational Chemistry Applications • Supports over 100 active users • ~70 peer reviewed papers per year • Over 40 applications installed • Authentication using NGS technologies • Portal to submit jobs • access for less computationally aware chemists JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  7. Tier-1 Architecture OPN • >8000 processor cores • >500 disk servers (10PB) • Tape robot (10PB) • >37 dedicated T10000 tape drives (A/B/C) SJ5 CMS CASTOR ATLAS CASTOR LHCB CASTOR GEN CASTOR CPU Storage Pools JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  8. E-infrastructure South • Consortium of UK universities • Oxford, Bristol, Southampton, UCL • Formed the Centre for Innovation • With STFC as a partner • Two New Services (£3.7M) • IRIDIS – Southampton – x86-64 • EMERALD – STFC – GPGPU Cluster • Part of larger investment in e-infrastructure • A Midland Centre of Excellence (£1M). Led by Loughborough University • West of Scotland Supercomputing Centre for Academia and Industry (£1.3m). Led by the University of Strathclyde • E-Infrastructure Interconnectivity (£2.58M). Led by the University of Manchester • MidPlus: A Centre of Excellence for Computational Science, Engineering and Mathematics (£1.6 M). Led by the University of Warwick JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  9. EMERALD • Providing Resources to Consortium and partners • Consortium of UK universities • Oxford, Bristol, Southampton, UCL, STFC • Largest production GPU facility in UK • 372 NvidiaTelsa M2090 GPUs • Scientific Applications • Still under discussion • Computational Chemistry front runners • AMBER • NAMD • GROMACS • LAMMPS • Eventually 100’s of applications covering all sciences JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  10. EMERALD • 6 racks JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  11. EMERALD HARDWARE I • 15 x SL6500 chassis: • 4 x GPU compute nodes, each 2 x CPUs and 3 x NVidia M2090 GPUs = 8 GPUs & 12 GPUs per chassis, power ~3.9kW •       SL6500 scalable line chassis •       4 x 1200W power supplies, 4 fans•       4 x 2U, half-width SL390 servers • SL390s nodes•       2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) •       3 x NVidia M2090 GP-GPUs (512 CUDA cores)•       48GB DDR-3 memory •       1 HDD 146GB SAS 15k drive •       HP QDR Infiniband & 10GbE ports •       Dual 1Gb network ports  JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  12. EMERALD HARDWARE II • 12 x SL6500 chassis, • 2 x GPU compute nodes, each 2 x CPUs and 8 x NVidia M2090 GPUs = 4 CPUs & 16 GPUs per chassis, power ~ 4.6kW.Twelve Chassis•       SL6500 scalable line chassis •       4 x 1200W power supplies, 4 fans•       2 x 4U, half-width SL390 servers • SL390s nodes•       2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) •       8 x NVidia M2090 GP-GPUs (512 CUDA cores)•       96GB DDR-3 memory •       1 HDD 146GB SAS 15k drive •       HP QDR Infiniband & 10GbEthernet•       Dual 1Gb network ports  JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  13. EMERALD • System Applications • RedHat Enterprise 6.x • Platform LSF • CUDA tool kit • SDK and libraries • Intel and Portland Compilers • Scientific Applications • Still under discussion • Computational Chemistry front runners • AMBER • NAMD • GROMACS • LAMMPS • Eventually 100s of applications covering all sciences JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  14. EMERALD • Managing a GPU cluster • GPUs are more power efficient and give more Gflops/Watt than x86_64 servers • Reality……True……But each 4 U Chassis: • ~1.2 kW/U space • Full rack required 40+ kW! • Hard to cool • Additional in row coolers • Cold aisle containment • Uneven power demand • Stresses aircon and power infrastructure • 240 GPU job • 31kW Cluster idle to 80kW instantly • Measured GPU parallel MPI job (HPL) using 368 GPU Cores ~1.4Gflops/W • Measured X5675 cluster parallel MPI job (HPL) ~0.5Gflops/W JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  15. JASMIN/CEMS • CEDA data storage & services • Curated data archive • Archive management services • Archive access services (HTTP, FTP, Helpdesk, ...) • Data intensive scientific computing • Global / regional datasets & models • High spatial, temporal resolution • Private cloud • Flexible access to high-volume & complex data for climate & earth observation communities • Online workspaces • Services for sharing & collaboration JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  16. JASMIN/CEMS Oct 2011 ... 8-Mar-2012 BIS Funds Tender  Order  Build  Network Complete • Deadline (or funding gone!): 31st March 2012 for “doing science” • Government Procurement : £5M Tender to order < 4 weeks • Machine room upgrades + Large Cluster compete for time • Bare floor to operation in 6 weeks • 6 hours from power off to 4.6PBytes ActiveStore11 mounted at RAL • “Doing science” 14th March • 3 Satellite Site installs in Parallel (Leeds 100TB, Reading 500TB, ISIC 600TB) JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  17. JASMIN/CEMS at RAL • 12 Racks w. Mixed Servers and Storage • 15KW/rack peak (180KW Total) • Enclosed cold aisle + in-aisle cooling • 600kg / rack (7.2 Tonnes total) • Distributed 10Gb network • (1 Terabit/s bandwidth) • - Single 4.5PB global file system • Two VMware vSphere pools of servers with dedicated image storage. • 6 Weeks bare floor to working 4.6PB. JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  18. JASMIN/ CEMS Infrastructure Configuration: Storage:103 Panasas ActiveStor 11 shelves, (2,208 x 3TB drives total). Computing: ‘Cloud’ of 100’s of Virtual machines hosted on 20 Dell R610 Servers Networking: 10Gb Gnodal throughout. “Lightpath” dedicated links to UK and EU Supercomputers Physical: 12 Racks. Enclosed aisle, in-row chillers Capacity: RAL 4.6 PB useable (6.6PB raw). This is equivalent to 920,000 DVDs (a 1.47 km high tower of DVDs) High Performance: 1.03 Tb/s total storage bandwidth = Copying 1500 DVDs per minute Single Namespace Solution: one single file system, managed as one system Status: The largest Panasas system in the world and one of the largest storage deployments in the UK JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  19. JASMIN/CEMS Networking • Gnodal 10Gb Networking • 160 x 10Gb Ports • in a 4 x GS4008 switch stack • Compute • 23 Dell servers for VM hosting • (VMware vCentre + vCloud) and HPC access to storage. • 8 Dell Servers for compute • Dell EquallogiciSCSI arrays (VM images) • All 10Gb connected. • Already upgraded 10Gb network • to add 80 more Gnodal 10Gb ports • Compute expansion JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  20. What is Panasas Storage? Director Blade Storage Blades • “A complete hardware and software storage solution” • Ease of Management • Single Management Console for 4.6PB • Performance • Parallel access via DirecFlow, NFS, CIFS • Fast Parallel reconstruction • ObjectRAID • All files stored as objects. • RAID level per file • Vertical, Horizontal and network parity • Distributed parallel file system • Parts (objects) of files on every blade • All blades transmit/receive in parallel • Global Name Space • Battery UPS • Enough to shut down cleanly. • 1x 10Gb Uplink per shelf • Performance scales with size JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  21. PanActive Manager JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  22. Panasas in Operation • Performance • Random IO 400MB/s per host • Sequential IO 1Gbyte/s per host • External Performance • 10Gb connected • Sustained 6Gp/s • Reliability • 1133 Blades • 206 Power Supplies • 103 Shelf Network switches • 1442 components • Soak testing revealed 27 faults • In Operation 7 faults • No loss of service • ~0.6% failure per year • Compared to commodity storage ~5% per year JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  23. Infrastructure SolutionsSystems Management • Backups • System and User Data • SVN • Codes and documentation • Monitoring • Ganglia, Cacti, Power-management • Alerting • Nagios • Security • Intrusion detection, patch monitoring • Deployment • Kickstart, LDAP, inventory database • VMware • Server consolidation,extra resilience • 150+ Virtual servers • Supporting all e-Science activities • Development Cloud • ~ JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  24. e-Infrastructures • Lead role in National and International e-infrastructures • Authentication • Lead and Develop UK e-Science Certificate Authority • Total issued ~30,000 • Current~3000 • Easy integration of UK Access Management Federation • Authorisation • Use existing EGI tools • Accounting • Lead and develop EGIAPEL accounting • 500M Records, 400GB data • ~282 Sites publish records • ~12GB/day loaded into the main tables • Usually 13 months but Summary data since 2003 • Integrated into existing HPC style services JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  25. e-Infrastructures • Lead role in National and International e-infrastructures • User Management • Lead and develop NGSUAS Service • Common portal for project owners • Manage Project and User Allocations • Display trends, make decisions (policing) • Information, what services are available? • Lead and develop the EGI information portal GOCDB • 2180 registered GOCDB users belonging to 40 registered NGIs • 1073 registered sites hosting a total of 4372 services • 12663 downtime entries entered via GOCDB • Training & Support • Training Market place • tool developed to promote training opportunities, resources and materials • SeIUCCR Summer Schools • Supporting 30 students for 1 week Course (120 Applicants) JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  26. Summary • High Performance Computing and Data • SCARF • NSCCS • JASMIN • EMERALD • GridPP – Tier1 • Managing e-Infrastructures • Authentication, Authorisation, Accounting • Resource discovery • User Management, help and Training JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

  27. Information • Website • http://www.stfc.ac.uk/SCD • Contact: Pete Oliver • peter.oliver at stfc.ac.uk Questions? JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing

More Related