1 / 35

Claudio Gheller CINECA (c.gheller@cineca.it)

The DEISA HPC Grid for Astrophysical Applications. Claudio Gheller CINECA (c.gheller@cineca.it). Disclaimer. My background: Computer science in astrophysics My involvement in DEISA: Support to scientific extreme computing projects (DECI) I’m not: A systems espert A networking expert.

sienna
Download Presentation

Claudio Gheller CINECA (c.gheller@cineca.it)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The DEISA HPC Grid for Astrophysical Applications Claudio Gheller CINECA (c.gheller@cineca.it)

  2. Disclaimer My background: Computer science in astrophysics My involvement in DEISA: Support to scientific extreme computing projects (DECI) I’m not: A systems espert A networking expert

  3. Conclusions DEISA is not Grid computing It is (super) super computing

  4. The DEISA project: overview What is: DEISA (Distributed European Infrastructure for Super-computing Applications)is a consortium of leading national EU supercomputing centres Goals: deploy and operate a persistent, production quality, distributed supercomputing environment with continental scope. When: The Project is funded by European Commission: May 2004 - April 2008. It has been re-funded (DEISA2): May 2008 – April 2010

  5. The DEISA project: drivers • Support High Performance Computing. • Integrate the Europe’s most powerful supercomputing systems. • Enable scientific discovery across a broad spectrum of science and technology. • Best exploitation of the resources both at site level and European level • Promote openness and usage of standards

  6. The DEISA project: what is NOT • DEISA is not a middleware development project. • DEISA, actually, is not a Grid: it does not support Grid computing. Rather it supports Cooperative Computing.

  7. The DEISA project: core partners BSC,Barcelona Supercomputing Centre, Spain CINECA, Consorzio Interuniversitario, Italy CSC,Finnish Information Technology Centre for Science, Finland EPCC/HPCx, University of Edinburgh and CCLRC, UK ECMWF, European Centre for Medium-Range Weather Forecast, UK FZJ, Research Centre Juelich, Germany HLRS, High Performance Computing Centre Stuttgart, Germany LRZ,Leibniz Rechenzentrum Munich, Germany RZG, Rechenzentrum Garching of the Max Planck Society, Germany IDRIS, Institut du Développement et des Resourcesen Informatique Scientifique– CNRS, France SARA,Dutch National High Performance Computing, Netherlands

  8. The DEISA project: Project Organization Three activity areas Networking: management, coordination and dissemination Service Activities: running the infrastructure Joint Research Activities: porting and running scientific applications on the DEISA infrastructure

  9. Deisa Activities, some (maybe too many…) details (1) Service Activities: • Network Operation and Support. (FZJ leader). Deployment and operation of a gigabit per second network infrastructure for an European distributed supercomputing platform. • Data Management with Global file systems. (RZG leader). Deployment and operation of global distributed file systems, as basic building blocks of the "inner" super-cluster, and as a way of implementing lobal data management in a heterogeneous Grid. • Resource Management. (CINECA leader). Deployment and operation of global scheduling services for the European super cluster, as well as for its heterogeneous Grid extension. • Applications and User Support. (IDRIS leader). Enabling the adoption by the scientific community of the distributed supercomputing infrastructure, as an efficient instrument for the production of leading computational science. • Security. (SARA leader). Providing administration, authorization and authentication for a heterogeneous cluster of HPC systems, with special emphasis on single sign-on

  10. Deisa Activities, some (maybe too many…) details (2) Scientific Applications Activities: • JRA1 – Material Science. • (RZG leader) • JRA2 – Cosmology. • (EPCC leader) • JRA3 – Plasma Physics. • (RZG leader) • JRA4 – Life Science. • (IDRIS leader) • JRA5 – Industry. • (CINECA leader) • JRA6 – Coupled Applications. • (IDRIS leader) • JRA7 – Access to Resources in Heterogeneous Environments. • (EPCC leader) The DEISA Extreme Computing Initiative (DECI) Seehttp://www.deisa.org/applications

  11. Goals: to avail the Virgo Consortium of the most advanced features of Grid computing by porting their production applications GADGET and FLASH to make an effective use of the DEISA infrastructure to lay the foundations of a Theoretical Virtual Observatory Leaded by EPCC which works in close partnership with the Virgo Consortium JRA2 managed jointly by Gavin Pringle (EPCC/DEISA) and Carlos Frenk (co-PI of both Virgo and VirtU) work progressed after gathering clear user requirements from Virgo Consortium. requirements and results published as public DEISA deliverables. JRA2: Cosmological Applications

  12. Current DEISA status • variety of systems connected via GEANT/GEANT2 (Premium IP) • centres contribute 5% to 10% of CPU cycles to DEISA • running projects selected from the DEISA Extreme Computing Initiative (DECI) calls Premium IP is a service that offers network priority over other traffic on GÉANT. Premium IP traffic takes priority over all other services .

  13. IDRIS IBM P4 FZJ IBM P4 HLRS NEC SX8 ECMWF IBM P4 HPCX IBM P5 CSC IBM P4 LRZ SGI ALTIX CINECA IBM P5 RZG IBM P4 SARA SGI ALTIX BSC IBM PPC DEISA HPC systems

  14. DEISA technical hints: software stack • UNICORE is the grid “glue” • not built on Globus • EPCC developing UNICORE command-line interface • Other components • IBM’s General Parallel File System multiclusterGPFS can span different systems over a WAN recent developments for Linux as well as AIX • IBM’s Load Leveler for job scheduling Multicluster Load Leveler can re-route batch jobs to different machines also available on Linux

  15. DEISA model • large parallel jobs running on a single supercomputer • network latency between machines not a significant issue • jobs submitted – ideally - via UNICORE, in practice via Load Leveler • re-routed where appropriate to remote resources • Single-Sign-On access via GSI-SSH • GPFS absolutely crucial to this model • jobs have access to data no matter where they run • no source code changes required • standard fread/fwrite(or READ/WRITE) calls to Unix files • also have a Common Production Environment • defines a common set of environment variables • defined locally to map to appropriate resources • Eg $DEISA_WORK will point to local workspace

  16. Running ideally on DEISA • Fill all the gaps • restart/continue jobs on any machine from file checkpoints • no need to recompile application program • no need to manually stage data • multi-step jobs running on multiple machines • easy access to data for post-processing after a run

  17. IDRIS IBM P4 FZJ IBM P4 HLRS NEC SX8 AIXLL-MC ECMWF IBM P4 Super-UXNQS II HPCX IBM P5 AIXLL-MC AIXLL AIXLL CSC IBM P4 LRZ SGI ALTIX AIXLL-MC LINUX PBS Pro AIXLL-MC CINECA IBM P5 AIXLL-MC RZG IBM P4 Job LINUX LL LINUX LSF SARA SGI ALTIX BSC IBM PPC Running on DEISA: Load Leveler

  18. ECMWF FZJ IBM HLRS HPCX LRZ CSC BSC IDRIS RZG SARA CINECA NJS IDRIS IBM P4 AIXLL-MC AIXLL LINUX PBS Pro AIXLL-MC AIXLL-MC AIXLL Super-UXNQS II AIXLL-MC LINUX LSF LINUX LL AIXLL-MC NJS FZJ IBM P4 IDB UUDB NJS HLRS NEC SX8 IDB UUDB NJS ECMWF IBM P4 NJS HPCX IBM P5 IDB UUDB IDB UUDB IDB UUDB Gateway ECMWF NJS CSC IBM P4 Gateway FZJ Gateway IDRIS IDB UUDB NJS LRZ SGI ALTIX Gateway HLRS Gateway HPCX IDB UUDB Gateway LRZ NJS CINECA IBM P5 Gateway RZG Gateway SARA IDB UUDB NJS RZG IBM P4 Gateway BSC Gateway CINECA NJS SARA SGI ALTIX NJS BSC IBM PPC IDB UUDB Gateway CSC IDB UUDB IDB UUDB Running ideally on DEISA: Unicore

  19. RENATER UKERNA FUNET GARR 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s 1 Gb/s SURFnet GÉANT2 RedIris 10 Gb/s 10 Gb/s Dedicated 10 Gb/s wavelength (in preparation)‏ Dedicated 10 Gb/s wavelength 10 Gb/s 1 Gb/s LSP 10 Gb/s Old 1 Gb/s LSP (will be removed soon)‏ 10 Gb/s DFN GPFS Multicluster HPC systems mount /deisa/sitename users read/write directly from/to these file systems /deisa/idr /deisa/cne /deisa/rzg /deisa/fzj /deisa/csc

  20. DEISA Common Production Environment (DCPE) DCPE… what is it? both a set of software (the software stack) and a generic interface to access the software (based on the Modules tool) • Required to both offer a common interface to the users and to hide the differences between local installations • Essential feature for job migration inside homogeneous super-clusters The DCPE includes: • shells (Bash and Tcsh), • compilers (C, C++, Fortran and Java), • libraries (for numerical analysis, data formatting, etc.), • tools (debuggers, profilers, editors, development tools), • applications.

  21. Modules Framework • Modules tool chosen because it was well known by many sites and many users • Public domain software • Tcl implementation used Modules: • offer a common interface different software components on different computers, • to hide different names and configurations • to manage individually each software and load only those required into the user environment, • for each user to change the version of each software independently of the others, • for each user to switch independently between the current default version of a software to another one (older or newer).

  22. IDRIS IBM P4 FZJ IBM P4 HLRS NEC SX8 ECMWF IBM P4 HPCX IBM P5 CSC IBM P4 LRZ SGI ALTIX CINECA IBM P5 RZG IBM P4 SARA SGI ALTIX BSC IBM PPC The HPC users’ vision Initial vision: “Full” Distributed computing Task2 Task3 Task1

  23. IDRIS IBM P4 FZJ IBM P4 HLRS NEC SX8 ECMWF IBM P4 HPCX IBM P5 CSC IBM P4 LRZ SGI ALTIX CINECA IBM P5 RZG IBM P4 SARA SGI ALTIX BSC IBM PPC The HPC users visions Impossible!!!! Initial vision: “Full” Distributed computing Task2 Task3 Task1

  24. IDRIS IBM P4 FZJ IBM P4 HLRS NEC SX8 ECMWF IBM P4 HPCX IBM P5 CSC IBM P4 LRZ SGI ALTIX CINECA IBM P5 RZG IBM P4 SARA SGI ALTIX BSC IBM PPC The HPC users vision Jump computing Task Task

  25. IDRIS IBM P4 FZJ IBM P4 HLRS NEC SX8 ECMWF IBM P4 HPCX IBM P5 CSC IBM P4 LRZ SGI ALTIX CINECA IBM P5 RZG IBM P4 SARA SGI ALTIX BSC IBM PPC The HPC users vision Jump computing Difficult… HPC applications are… HPC applications!!! Fine tuned on the architectures Task Task

  26. IDRIS IBM P4 FZJ IBM P4 HLRS NEC SX8 AIXLL-MC ECMWF IBM P4 Super-UXNQS II HPCX IBM P5 AIXLL-MC AIXLL AIXLL CSC IBM P4 LRZ SGI ALTIX AIXLL-MC LINUX PBS Pro AIXLL-MC CINECA IBM P5 AIXLL-MC RZG IBM P4 Job LINUX LL LINUX LSF SARA SGI ALTIX BSC IBM PPC So… what… Jump computing is useful to reduce queue waiting times. Find the gap… and fill it… can work, better on homogeneous systems

  27. IDRIS IBM P4 DEISA GPFS SHARED FILESYSTEM FZJ IBM P4 HLRS NEC SX8 AIXLL-MC ECMWF IBM P4 Super-UXNQS II HPCX IBM P5 AIXLL-MC AIXLL AIXLL CSC IBM P4 LRZ SGI ALTIX AIXLL-MC LINUX PBS Pro AIXLL-MC CINECA IBM P5 AIXLL-MC RZG IBM P4 LINUX LL LINUX LSF SARA SGI ALTIX BSC IBM PPC So… what… Single image filesystem is a great solution!!!!! (even if moving data…)

  28. So… what… Usual Grid solution requires to learn new stuff… Often scientists are not willing to… DEISA rely on Load Leveler (or other common scheduling systems)… same scripts, same commands you are used to!!! However, only IBM systems support LL… The Common Production Environment offers a shared (and friendly) set of tools to the users. However, compromises must be accepted…

  29. Low latency Capability supercomputer Enabling computing HPC centres Capacity supercomputer Distributed supercomputing DEISA Low integration High integration Distributed computing and data grids: EGEE Capacity cluster Internet GRID High latency Summing up… Growing up, DEISA is moving away from a Grid. In order to fulfill the needs of HPC users, it is trying to become a huge supercomputer. On the other hand, DEISA2 must lead to a service infrastructure and users’ expectations MUST be matched (no more time for experiments…)

  30. Identification, deployment and operation of a number of « flagship » applications requiring the infrastructure services, in selected areas of science and technology. European Call for proposals in May - June every year. Applications are selected on the basis of scientific excellence, innovation potential and relevance criteria, with the collaboration of the HPC national evaluation committees. DECI users are supported by the Applications Task Force (ATASKF), whose objective is to enable and deploy the Extreme Computing applications. DECI: enabling Science to DEISA

  31. Planck (useless) overview: Planck is the 3rd generation space mission for the mapping and the analysis of the microwave sky: its unprecedented combination of sky and frequency coverage, accuracy, stability and sensitivity is designed to achieve the most efficient detection of the Cosmic Microwave Background ( CMB ) in both temperature and polarisation. In order to achieve the ambitious goals of the mission, unanimously acknowledged by the scientific community to be of the highest importance, data processing of extreme accuracy is needed. LFI-SIM DECI Project (2006)

  32. Need of simulations in Planck NOT the typical DECI-HPC project !!! Simulations are used to: • assess likely science outcomes; • set requirements on instruments in order to achieve the expected scientific results; • test the performance of data analysis algorithms and infrastructure; • help understanding the instrument and its noise properties; • analyze known and unforeseen systematic effects; • deal with known physics and new physics. Predicting the data is fundamental to understand them.

  33. Add foregrounds Add foregrounds Add foregrounds reference sky maps Add foregrounds Generate CMB sky Add foregrounds “Observe” sky with LFI Time-Ordered Data cosmological parameters • Knowledge and details increase over time, therefore the whole computational chain must be iterated many times Add foregrounds Add foregrounds Data reduction cosmological parameters frequency sky maps component maps Freq. merge Comp. sep. Parameter evaluation C(l) evaluation C(l) instrument parameters Simulation pipeline NEED OF HUGE COMPUTATIONAL RESOURCES GRID can be a solution!!!

  34. DEISA was expected to be used to simulate many times the whole mission of Planck’s LFI instrument, on the basis of different scientific and instrumental hypotheses; reduce, calibrate and analyse the simulated data down to the production of the final products of the mission, in order to evaluate the impact of possible LFI instrumental effects on the quality of the scientific results, and consequently to refine appropriately the data processing algorithms. Model 1 Model 2 Model 3 Model N Planck & DEISA

  35. Planck simulations are essential to get the best possible understanding of the mission and to have a “conscious expectation of the unexpected” They also allow to properly plan Data Processing Centre resources The usage of the EGEE grid resulted to be more suitable for such project since it provides fast access to small/medium computing resources. Most of the Planck pipeline is happy with such resources!!! However DEISA was useful to produce massive sets of simulated data and to perform and test the data processing steps which requires large computing resources (lots of coupled processors, large memories, large bandwidth…) Interoperation between the two grid infrastructures (possibly based on the G-Lite middleware) is expected in the next years Outcomes

More Related