1 / 15

Exploring Distributed Computing Techniques with Ccactus and Globus

Exploring Distributed Computing Techniques with Ccactus and Globus. Thomas Dramlitsch Albert-Einstein-Institut MPI-Gravitationsphysik (and AEI-ANL-NCSA-LBL team). Solving Einstein’s Equations, Black Holes, and Gravitational Wave Astronomy

Download Presentation

Exploring Distributed Computing Techniques with Ccactus and Globus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring Distributed Computing Techniques with Ccactus and Globus Thomas Dramlitsch Albert-Einstein-Institut MPI-Gravitationsphysik (and AEI-ANL-NCSA-LBL team) • Solving Einstein’s Equations, Black Holes, and Gravitational Wave Astronomy • Cactus, a new community simulation code framework: Grid enabling capabilities • Previous Metacomputing experiments • What we learned form those • Current work, improvements • The present state • Future development, goals Albert-Einstein-Institut www.aei-potsdam.mpg.de

  2. What is Cactus?: new concept in communitydeveloped simulation code infrastructure • Numerical/computational infrastructure to solve PDE’sFreely available, open community source code: spirit of gnu/linux • Developed as Response to Needs of these projects • It’s production-software • Cactus Divided in “Flesh” (core) and “Thorns” (modules or collections of subroutines) • User choice between Fortran, C, C++; automated interface between them • Parallelism largely automatic and hidden (if desired) from user • Checkpointing / Restart capabilities • Many parallel utilities / features enabled by Cactus • Parallel IO: FlexIO, HDF5; Data streaming, remote visualization/steering • Elliptic solvers: PETSc • And of course Metacomputing • A Vision: any application can plug into Cactus to be Grid enabled • Demo tomorrow night at HPDC Albert-Einstein-Institut www.aei-potsdam.mpg.de

  3. Modularity of Cactus... Sub-app Application 1b Application 1a ... Application 2 User selects desired functionality... Cactus Flesh Remote Steer 3 AMR (Grace, etc) MPI layer 1 I/O layer 2 Globus Metcomputing Services Albert-Einstein-Institut www.aei-potsdam.mpg.de

  4. Metacomputing: harnessing power when and where it is needed • Einstein equations typical of apps that require extreme memory, speed • many Flops per grid zone (~103 - 104) • Finite differences on regular grids • Communications of variables through derivatives: ghost zones • Largest supercomputers too small! • Networks very fast! • OC-12 and higher very common in US • G-Win: 622 Mbits Potsdam-Berlin-Garching, connect multiple supercomputers • Gigabit networking to US possible • “Seamless computing and visualization from anywhere” • Many metacomputing experiments in progress Albert-Einstein-Institut www.aei-potsdam.mpg.de

  5. High performance: Full 3D Einstein Equations solved on NCSA NT Supercluster, Origin 2000, T3E • Excellent scaling on many architectures • Origin up to 256 processors • T3E up to 1024 • NCSA NT cluster up to 128 processors • Achieved 142 Gflops/s on 1024 node T3E-1200 (benchmarked for NASA NS Grand Challenge) • But, of course, we want much more… metacomputing, meaning connected computers... Albert-Einstein-Institut www.aei-potsdam.mpg.de

  6. Metacomputing the Einstein Equations:Connecting T3E’s in Berlin, Garching, San Diego Want to migrate this technology to the generic user... Albert-Einstein-Institut www.aei-potsdam.mpg.de

  7. San Diego & Berlin Berlin & Munich Scaling of Cactus on two T3Es on different continents Albert-Einstein-Institut www.aei-potsdam.mpg.de

  8. Scaling of Cactus on Multiple SGIs at Remote Sites Argonne & NCSA Albert-Einstein-Institut www.aei-potsdam.mpg.de

  9. Analysis of previous metacomputing experiments • It worked! (That’s the main thing we wanted at SC98…) • Cactus was not optimized for metacomputing: messages too small, latency etc.. • Mpich-G could perform better, e.g. intra-machine communication one order of magnitude slower than native MPI • Mpich-G2 improves this... • Communication is non-trivial (not “embarrassingly parallel”) and very intensive • Experiments showed: • For some problems, this is feasible • We to improve performance significantly with work on optimization of Cactus and Mpich-G • That’s what we did! Albert-Einstein-Institut www.aei-potsdam.mpg.de

  10. Optimizing Cactus Communication Layers for Metacomputing • Made the communication layer(s) much more flexible: • Can specify size and number of messages, in order to achieve best performance with the underlying network (bandwith, latency) • Reduced communication to a bare minimum • Overlapping of communication with other cpu’s • Overlapping of communication and Computation • Made the load balancing of cactus more flexible (Matei Ripeanu): • Cactus now allows to decompose the total problem into pieces of different size, according to cpu-power, number of cpu’s used on one machine etc... • Cactus compiles (out of the box) with globus and mpich on most common architectures (T3e, Irix, SP-2,…?) Albert-Einstein-Institut www.aei-potsdam.mpg.de

  11. Optimizing Mpich-G: Used Mpich-G2 • MPICH-G2 is a completely rewritten communication layer • Can distinguish between inter- and intra-machine communication • It uses the vendor’s supplied mpi for intra-machine communication • Uses TCP/IP between machines • This means optimal performance in a metacomputing environment • Works with Cactus and Globus on all major unix-systems TCP/IP MPI_COMM_WORLD Albert-Einstein-Institut www.aei-potsdam.mpg.de

  12. Current experiments and future plans • Current Experiment • Complete testing and production of tightly coupled simulation between different sites in the USA (NCSA, NERSC, ANL, SDSC and others) • Want to use advanced software (Portal, co-scheduling systems etc..) • Want to run across many sites and nodes as possible • More General Grid Computing problems • Distribution of multiple grids • Dynamic resource acquisition • Aquiring more memory when needed (AMR) • Spawning off connected jobs on remote machines • Cactus thorn would have access to MDS • … Albert-Einstein-Institut www.aei-potsdam.mpg.de

  13. Cactus Computational Toolkit Science, Autopilot, AMR, Petsc, HDF, MPI, GrACE, Globus, Remote Steering... A Portal to Computational Science: The Cactus Collaboratory 1. User has science idea... 2. Composes/Builds Code Components w/Interface... 3. Selects Appropriate Resources... 4. Steers simulation, monitors performance... 5. Collaboratorslog in to monitor... Want to integrate and migrate this technology to the generic user... Albert-Einstein-Institut www.aei-potsdam.mpg.de

  14. German Gigabit Project supported by DFN-Verein • Developing Techniques to Exploit High Speed Networks • Focus on Remote Steering and Visualization • OC-12 Testbed between AEI, ZIB, RZG with built-in application groups ready to use it! • Already closely connected to ANL, NCSA, KDI projects AEI Albert-Einstein-Institut www.aei-potsdam.mpg.de

  15. Metacomputing Experiments, Production • SC93: remote CM-5 simulation with live viz in CAVE • SC95: Heroic I-Way experiments leads to development of Globus. Cornell SP-2, Power Challenge, with live viz in San Diego CAVE • SC97: Garching 512 node T3E, launched, controlled, visualized in San Jose • SC98: HPC Challenge. SDSC, ZIB, and Garching T3E compute collision of 2 Neutron Stars, controlled from Orlando • SC99: Colliding Black Holes using Garching, ZIB T3E’s, with remote collaborative interaction and viz at ANL and NCSA booths • April 2000: Attempting to use LANL, NCSA, NERSC, SDSC, ZIB, Garching, NASA-Ames, Maui?, +…? for single simulation! • All this technology is available to in main production code for different applications! Albert-Einstein-Institut www.aei-potsdam.mpg.de

More Related