1 / 53

Computational Science at STFC and the Exploitation of Novel HPC Architectures Mike Ashworth Scientific Computing Depa

Computational Science at STFC and the Exploitation of Novel HPC Architectures Mike Ashworth Scientific Computing Department a nd STFC Hartree Centre STFC Daresbury Laboratory mike.ashworth@stfc.ac.uk. STFC’s Scientific Computing Department STFC’s Hartree Centre

huela
Download Presentation

Computational Science at STFC and the Exploitation of Novel HPC Architectures Mike Ashworth Scientific Computing Depa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Science at STFC and the Exploitation of Novel HPC Architectures • Mike Ashworth • Scientific Computing Department • and • STFC Hartree Centre • STFC Daresbury Laboratory • mike.ashworth@stfc.ac.uk

  2. STFC’s Scientific Computing Department • STFC’s Hartree Centre • Exploitation of Novel HPC Architectures

  3. STFC’s Scientific Computing Department • STFC’s Hartree Centre • Exploitation of Novel HPC Architectures

  4. Organisation HM Government (& HM Treasury) RCUK Executive Group

  5. Daresbury Laboratory Daresbury Science and Innovation Campus Warrington, Cheshire UK Astronomy Technology Centre, Edinburgh, Scotland Polaris House Swindon, Wiltshire Rutherford Appleton Laboratory Harwell Science and Innovation Campus Didcot, Oxfordshire Chilbolton Observatory Stockbridge, Hampshire STFC’s Sites Isaac Newton Group of Telescopes La Palma Joint Astronomy Centre Hawaii

  6. Understanding our Universe STFC’s Science Programme Particle Physics Large Hadron Collider (LHC), CERN - the structure and forces of nature Ground based Astronomy European Southern Observatory (ESO), Chile Very Large Telescope (VLT), Atacama Large Millimeter Array (ALMA), European Extremely Large Telescope (E-ELT), Square Kilometre Array (SKA) Space based Astronomy European Space Agency (ESA) Herschel/Planck/GAIA/James Webb Space Telescope (JWST) Bi-laterals – NASA, JAXA, etc. STFC Space Science Technology Department Nuclear Physics Facility for anti-proton and Ion research (FAIR), Germany Nuclear Skills for - medicine (Isotopes and Radiation applications), energy (Nuclear Power Plants) and environment (Nuclear Waste Disposal)

  7. STFC’s Facilities Neutron Sources ISIS - pulsed neutron and muon source/ and Institute Laue-Langevin (ILL), Grenoble Providing powerful insights into key areas of energy, biomedical research, climate, environment and security. High Power Lasers Central Laser Facility - providing applications on bioscience and nanotechnology HiPER Demonstrating laser driven fusion as a future source of sustainable, clean energy Light Sources Diamond Light Source Limited (86%) - providing new breakthroughs in medicine, environmental and materials science, engineering, electronics and cultural heritage European Synchrotron Radiation Facility (ESRF), Grenoble

  8. Scientific Computing Department Major funded activities • 190 staff supporting over 7500 users • Applications development and support • Compute and data facilities and services • Research: over 100 publications per annum • Deliver over 3500 training days per annum • Systems administration, data services, high-performance computing, numerical analysis & software engineering. • Major science themes and capabilities • Expertise across the length and time scales from processes occurring inside atoms to environmental modelling Director: Adrian Wander Appointed 24th July 2012

  9. The UK National Supercomputing facilities • The UK National Supercomputing Services are managed by EPSRC on behalf of the UK academic communities • HPCx ran from 2002-2009 using IBM POWER4 and POWER5 • HECToRcurrent service 2007-2014 • Located at Edinburgh, operated jointly by STFC and EPCC • HECToR Phase3 90,112 cores Cray XE6 (660 Tflop/s Linpack) • ARCHER is the new service, early access now, service starts 16th Dec‘13, operated by STFC and EPCC (1.37 Pflop/s Linpack #19 TOP500)

  10. PRACE • PRACE launched 9th June 2010 • 25 member countries; seat in Belgium • France, Germany, Italy and Spain have each committed €100M over 5 years • EC funding of €70M for infrastructure • Tier-0 infrastructure providing free-of-charge service for European scientific communities based on peer review • Four projects to date: PP, 1IP, 2IP, 3IP overlapping • 1IP finished; 2IP extended; 3IP about half way through • EPSRC represent UK on PRACE Council; STFC and EPCC carry out technical work in the PRACE projects • STFC focus is on application optimization & benchmarking, technology evaluation, procurement procedures, training • STFC contributes 2.5% BlueGene/Q into the PRACE DECI calls

  11. Scientific Highlights • Journal of Materials Chemistry16 no. 20 (May 2006) - issue devoted to HPC in materials chemistry (esp. use of HPCx); • Phys. Stat. Sol.(b)243 no. 11 (Sept 2006) - issue featuring scientific highlights of the Psi-k Network (the European network on the electronic structure of condensed matter coordinated by our Band Theory Group); • Molecular Simulation32 no. 12-13 (Oct, Nov 2006) - special issue on applications of the DL_POLY MD program written & developed by Bill Smith (the 2nd special edition of Mol Sim on DL_POLY - the 1st was about 5 years ago); • ActaCrystallographica Section D63 part 1 (Jan 2007) - proceedings of the CCP4 Study Weekend on protein crystallography. • The Aeronautical Journal, Volume 111, Number 1117(March 2007), UK Applied Aerodynamics Consortium, Special Edition. • Proc Roy Soc A Volume 467, Number 2131 (July 2011), HPC in the Chemistry and Physics of Materials. Last 5 years metrics: • 67 grants of order £13M • 422 refereed papers and 275 presentations • Three senior staff have joint appointments with Universities • Seven staff have visiting professorships • Six members of staff awarded Senior Fellowships or Fellowships by Research Councils’ individual merit scheme • Five staff are Fellows of senior learned societies

  12. STFC’s Scientific Computing Department • STFC’s Hartree Centre • Exploitation of Novel HPC Architectures

  13. Opportunities

  14. Tildesley Report • BIS commissioned a report on the strategic vision for a UK e-Infrastructure for Science and Business. • Prof Dominic Tildesley led the team including representatives from Universities, Research Councils, industry and JANET. The scope included compute, software, data, networks, training and security. • Mike Ashworth, Richard Blake and John Bancroft from STFC provided input. • Published in December 2011. Google the title to download from the BIS website

  15. Government Investmentin e-infrastructure - 2011 17th Aug 2011: Prime Minister David Cameron confirmed £10M investment into STFC's Daresbury Laboratory. £7.5M for computing infrastructure 3rd Oct 2011: Chancellor George Osborne announced £145M for e-infrastructure at the Conservative Party Conference 4th Oct 2011: Science Minister David Willettsindicated £30M investment in Hartree Centre 30th Mar 2012: John Womersley CEO STFC and Simon Pendlebury IBM signed major collaboration at the Hartree Centre Clockwise from top left

  16. Intel collaboration • STFC and Intel have signed an MOU to develop and test technology that will be required to power the supercomputers of tomorrow. • STFC and Intel have signed an MOU to develop and test technology that will be required to power the supercomputers of tomorrow. • Karl Solchenbach, Director of European Exascale Computing at Intel said "We will use STFC's leading expertise in scalable applications to address the challenges of exascale computing in a co-design approach."

  17. Collaboration with Unilever 1st Feb 2013: Also announced was a key partnership with Unilever in the development of Computer Aided Formulation (CAF) • Months of laboratory bench work can be completed within minutes by a tool designed to run as an ‘App’ on a tablet or laptop which is connected remotely to the Blue Joule supercomputer at Daresbury. • This tool predicts the behaviour and structure of different concentrations of liquid compounds, both in the bottle and in-use, and helps researchers plan fewer and more focussed experiments. The aggregation of surfactant molecules into micelles is an important process in product formulation John Womersley, CEO STFC, and Jim Crilly, Senior Vice President, Strategic Science Group at Unilever

  18. Hartree Centre IBM BG/Q Blue Joule TOP500 • (#13 in Jun 2012 list) • #23 in the Nov2013 list #8 in Europe • #2 system in UK 6 racks • 98,304 cores • 6144 nodes • 16 cores & 16 GB per node • 1.25 Pflop/s peak • 1 rack to be configured as BGAS (Blue Gene Advanced Storage) • 16,384 cores • Up to 1PB Flash memory

  19. Hartree Centre IBM iDataPlex Blue Wonder • TOP500 • (#114 in Jun 2012 list) • #283 in the Nov 2013 list • 8192 cores, 170 Tflop/s peak • node has 16 cores, 2 sockets • Intel Sandy Bridge (AVX etc.) • 252 nodes with 32 GB • 4 nodes with 256 GB • 12 nodes with X3090 GPUs • 256 nodes with 128 GB • ScaleMP virtualization software up to 4TB virtual shared memory

  20. Hartree Centre Datastore Storage: 5.76 PB usable disk storage 15 PB tape store

  21. Hartree Centre Visualization Four major facilities: Hartree Vis-1: a large visualization “wall” supporting stereo Hartree Vis-2: a large surround and immersive visualization system Hartree ISIC: a large visualization “wall” supporting stereo at ISIC Hartree Atlas: a large visualization “wall” supporting stereo in the Atlas Building at RAL, part of the Harwell Imaging Partnership (HIP) Virtalis is the hardware supplier

  22. Home of the 2nd most powerful supercomputer in the UK

  23. Douglas RaynerHartree • Father of Computational Science • Hartree–Fock method • Appleton–Hartree equation • Differential Analyser • Numerical Analysis Douglas Rayner Hartree PhD, FRS (1897 –1958) “It may well be that the high-speed digital computer will have as great an influence on civilization as the advent of nuclear power” 1946 Douglas Hartree with Phyllis Nicolson at the Hartree Differential Analyser at Manchester University

  24. Hartree Centre Official Opening 1st Feb 2013: Chancellor George Osborne and Science Minister David Willetts opened the Hartree Centre and announced a further £185M of funding for e-Infrastructure £19M for the Hartree Centre for power-efficient computing technologies £11M for the UK’s participation in the Square Kilometre Array • This investment forms part of the £600 million investment for science announced by the Chancellor at the Autumn Statement 2012. • “By putting out money into science we are supporting the economy of tomorrow.” George Osborne opens the Hartree Centre, 1st February 2013

  25. Work with us to: • Sharpen your innovation • Improve your global competitiveness • Reduce research and development costs • Reduce costs for certification • Speed up your time to market

  26. Power-efficient Technologies ‘Shopping List’ • £19M investment in power-efficient technologies • System with latest NVIDIA Kepler GPUs • System based on Intel Xeon Phi • System based on ARM processors • Active storage project using IBM BGAS • Dataflow architecture based on FPGAs • Instrumented machine room • Systems will be made available for development and evaluation projects with Hartree Centre partners from industry, government and academia

  27. STFC’s Scientific Computing Department • STFC’s Hartree Centre • Exploitation of Novel HPC Architectures

  28. Accelerators in the TOP500

  29. Where are the accelerators? • TOP500 1-10 accelerator based systems: • #1 Tianhe-2 33.8 Pflop/s Intel Xeon Phi 3120k (2736k) cores • #2 Titan 17.6 NvidiaK20x 560k (262k) • #6 Piz Daint 6.27 NvidiaK20x 116k (74k) • #7 Stampede 5.17 Intel Xeon Phi 462k (366k) • UK National resources: • STFC Hartree Centre IBM iDataPlex has 48 Nvidia Fermi GPUs • UK Regional resources: • e-Infrastructure South: EMERALD consists of 372 Fermi GPUs • Local resources: • University and departmental clusters • Under your desk?

  30. The Power Wall • Transistor density is still increasing • Clock frequency is not due to power density constraints • Cores per chip is increasing, multi-core CPUs (currently 8-16) and GPUs (~500) • Little further scope for instruction level parallelism Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond)

  31. Processor Comparison * single precision cores. double precision is 1/4.

  32. Software Challenges • Fundamental challenges is extracting additional parallelism in your application • Fortran/C + MPI • OpenMP for multi-core • CUDA for Nvidia GPUs • OpenCL • OpenACC • Directive-based from Cray (& PGI) • OpenMP 4 for accelerators

  33. Gung-Ho – a new atmospheric dynamical core • NEMO on GPUs • DL_POLY on GPUs • LBM on GPUs

  34. Current Unified Model Met Office Unified Model ‘Unified’ in the sense of using the same code for weather forecasting and for climate research Combines dynamics on a lat/long grid with physics (radiation, clouds, precipitation, convection etc.) • Also couples to other models (ocean , sea-ice, land surface, chemistry/aerosols etc.) for improved forecasting and earth system modelling • “New Dynamics” Davies et al (2005) • “ENDGame” to be operational in 2013 2003 2011 Performance of the UM (dark blue) versus a basket of models measured by 3-day surface pressure errors From Nigel Wood, Met Office

  35. Limits to Scalability of the UM • The current version (New Dynamics) has limited scalability • The latest ENDGame code improves this, but a more radical solution is required for Petascale and beyond • The problem lies with the spacing of the lat/long grid at the poles • At 25km resolution, grid spacing near poles is 75m • At 10km this reduces to 12m! (17km) Perfect scaling POWER7 Nodes From Nigel Wood, Met Office

  36. Challenging Solutions • Globally • Uniform • Next • Generation • Highly • Optimized GUNG-HO targets a brand new dynamical core Scalability – choose a globally uniform grid which has no poles (see below) Speed – maintain performance at high & low resolution and for high & low core counts Accuracy – need to maintain standing of the model Space weather implies a 600km deep model Five year project 2011-2015 Operational weather forecasts around 2020! “Working together harmoniously” Triangles Cube-sphere Yin-Yang From Nigel Wood, Met Office

  37. Design considerations • Ford et al, “Gung Ho: A code design for weather and climate prediction on exascalemachines”, EASC 2013, to appear in a special edition of the Advances in Engineering Software • Fortran 2003, MPI, OpenMPand OpenAcc • Other models e.g. PGAS, CAF, are not excluded • Indirect addressing in the horizontal to support a wide range of possible grids • Direct addressing in the vertical • Vertical index innermost is optimal for cache re-use in CPUs, and can also achieve coalesced memory access in GPUs

  38. Software architecture • The Gung-Ho software architecture is structured into layers communicating via a defined API • the driver layer (control for one or more models) • the algorithm layer (high-level specification) • the parallelisation system (PSy) (inter-node and intra-node parallelism, parsing, transformations, hardware-specific code generation) • the kernel layer(toolkit of algorithm building blocks with directives) • the infrastructure layer (generic library to support parallelisation, communications, coupling, I/O etc.)

  39. Gung-Ho Single Model architecture • The arrows represent the APIs connect-ingthe layers • The direction shows the flow control • A code generator will parse the algorithm layer source code.

  40. Structure Alg Code Algorithm Generator Alg Code Parser PSy Code PSy Generator Kernel Codes

  41. Generator ... ast,invokeInfo=parse(filename,invoke_name=invokeName) alg=algGen(ast,invokeInfo,psyName=psyName, invokeName=invokeName) psy=psyGen(invokeInfo,psyName=psyName) ...

  42. Invoking the generator rupert@ubuntu:~/proj/GungHoSVN/LFRIC/src/generator$ python generator.py usage: generator.py [-h] [-oalg OALG] [-opsy OPSY] filename generator.py: error: too few arguments integrate_one_generate: python ../generator/generator.py -oalg integrate_one_alg.F90 -opsy integrate_one_psy.F90 integrate_one.F90 make integrate_one_generated

  43. Example (integrate_one) program main ... use integrate_one_module, only : integrate_one_kernel ... call invoke(integrate_one_kernel(x, integral)) ... end program main PROGRAM main ... USE psy, ONLY: invoke_integrate_one_kernel ... CALL invoke_integrate_one_kernel(x, integral) … END PROGRAM main

  44. Example (integrate_one) module integrate_one_module use kernel_mod implicit none private public integrate_one_kernel public integrate_one_code type, extends(kernel_type) :: integrate_one_kernel type(arg) :: meta_args(2) = (/& arg(READ, (CG(1)*CG(1))**3, FE), & arg(SUM, R, FE)/) integer :: ITERATES_OVER = CELLS contains procedure, nopass :: code => integrate_one_code end type integrate_one_kernel contains subroutine integrate_one_code(layers, p1dofm, X, R) ...

  45. Example (integrate_one) MODULE psy USE integrate_one_module, ONLY: integrate_one_code USE lfric IMPLICIT NONE CONTAINS SUBROUTINE invoke_integrate_one_kernel(x, integral) ... SELECT TYPE ( x_space=>x%function_space ) TYPE IS ( FunctionSpace_type ) topology => x_space%topology nlayers = topology%layer_count() p1dofmap => x_space%dof_map(cells, fe) END SELECT DO column=1,topology%entity_counts(cells) CALL integrate_one_code(nLayers, p1dofmap(:,column), x%data, integral%data(1)) END DO END SUBROUTINE invoke_integrate_one_kernel END MODULE psy

  46. Timetable • Further development and testing of horizontal [2013] • Testing of proposals for code architecture [2013] • Vertical discretization [2013] • 3D prototype development [2014-2015] • Operational around 2020…?

  47. NEMO Acceleration on GPUs using OpenACC • Maxim Milakov, Peter Messmer, Thomas Bradley, NVIDIA • Flat profile, code converted using OpenACC directives • GYRE only – we are looking at more realistic test cases, with ice and land • Tesla M2090 GPUs, Westmere CPUs • Milakov, Messmer, Bradley, GPU Technology Conference, • 18th-22nd March 2013

  48. DL_POLY Acceleration on GPUs • ChritosKartsaklisand RuairiNestor, ICHEC, • IlianTodorov and Bill Smith, STFC • CUDA implem-entation of key DL_POLY features: • Constraints Shake • Link cell pairs • Two-body forces • Ewald SPME forces • DMPC (dimethyl pyrocarbonate) in water, 413896 atoms • (test case 4)

  49. DL_POLY Acceleration on GPUs • “Benchmarking and Analysis of DL_POLY 4 on GPU Clusters” • Lysaght et al • PRACE report • Significant 8x speed-up using cuFFT • MPI code scales to 108 atoms on >105 cores • Pure MPI vs. 2 GPUs on the ICHEC Stokes GPU cluster • Sodium Chloride, 216000 Ions (test case 2)

  50. 3D LBM on Kepler GPUs (1) • Mark Mawson & Alistair Revell, Manchester • Lattice Boltzmann Method (LBM) for solving fluid flow • Focus on memory transfer issues

More Related