1 / 20

Cactus and Grid Computing

Cactus and Grid Computing. Cactus, a new community simulation code framework Toolkit for any PDE systems, ray tracing, etc... Suite of solvers for Einstein and astrophysics systems (CarlK: But Cactus is not an astro app)

anise
Download Presentation

Cactus and Grid Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cactus and Grid Computing • Cactus, a new community simulation code framework • Toolkit for any PDE systems, ray tracing, etc... • Suite of solvers for Einstein and astrophysics systems (CarlK: But Cactus is not an astro app) • Grid Computing, remote collaborative tools: what a scientist really wants and needs (but may not yet realize…) The Cactus Team Here today: Ed Seidel, Gabrielle Allen Albert Einstein Institute cactus@cactuscode.org Albert-Einstein-Institut www.aei-potsdam.mpg.de

  2. Computational Needs for 3D Numerical Relativity:Can’t fulfill them now, but about to change... • Explicit Finite Difference Codes • ~ 104 Flops/zone/time step • ~ 100 3D arrays • Require 10003 zones or more • ~1000 Gbytes • Double resolution: 8x memory, 16x Flops • Parallel AMR, I/O essential • A code that can do this could be useful to other projects (we said this in all our grant proposals)! • Last few years devoted to making this useful across disciplines… • All tools used for these complex simulations available for other branches of science, engineering… • Scientist/engineer wants to know only that! • But what algorithm? architecture? parallelism?, etc... t=100 t=0 • InitialData: 4 coupled nonlinear elliptics • Evolution • hyperbolic evolution • coupled with elliptic eqs. Multi TFlop, Tbyte machine essential Albert-Einstein-Institut www.aei-potsdam.mpg.de

  3. CactusNew concept in communitydeveloped simulation code infrastructure • Developed as response to needs of big community projects • NSF Black Hole Grand Challenge, NASA NS GC, etc…maybe Geophysics GC... • New: EU Network about to be Grid Enabled! • Numerical/computational infrastructure to solve PDE’s • Freely available, Open Source community framework: spirit of gnu/linux • Many communities contributing to Cactus • Cactus Divided in “Flesh” (core) and “Thorns” (modules or collections of subroutines) • Flesh, written in C, glues together various components • Multilingual: User apps can be Fortran, C, C++; automated interface between them • Abstraction: Cactus Flesh provides API for virtually all CS type operations • Driver functions (storage, communication between processors, etc) • Interpolation, Reduction, etc... • IO (traditional, socket based, remote viz and steering…) • Checkpointing, coordinates • Etc, etc… • Cactus is a Grid-enabling application middleware... Albert-Einstein-Institut www.aei-potsdam.mpg.de

  4. How to use Cactus Features • Application scientist usually concentrates on the application... • Performance • Algorithms • Logically: Operations on a grid (structured or unstructured (coming…)) • ...Then takes advantage of parallel API features enabled by Cactus • IO, Data streaming, remote visualization/steering, AMR, MPI, checkpointing, Grid Computing, etc… • Abstraction allows one to switch between different MPI, PVM layers, different I/O layers, etc, with no or minimal changes to application! • (nearly) All architectures supported and autoconfigured • Common to develop on laptop (no MPI required); run on anything • Compaq / SGI Origin 2000 / T3E / Linux clusters + laptops / Hitachi /NEC/HP/Windows NT/ SP2, Sun • Metacode Concept • Very, very lightweight, not a huge framework (not Microsoft Office) • User specifies desired code modules in configuration files • Desired code generated, automatic routine calling sequences, syntax checking, etc… • You can actually read the code it creates... • http://www.cactuscode.org Albert-Einstein-Institut www.aei-potsdam.mpg.de

  5. Modularity of Cactus... Legacy App 2 Sub-app Application 1 ... Application 2 User selects desired functionality… Code created... Abstractions... Cactus Flesh Unstructured... AMR (GrACE, etc) MPI layer 3 I/O layer 2 Remote Steer 2 MDS/Remote Spawn Globus Metacomputing Services Albert-Einstein-Institut www.aei-potsdam.mpg.de

  6. Computational Toolkit: provides parallel utilities (thorns) for computational scientist • Cactus is a framework or middleware for unifying and incorporating code from Thorns developed by the community • Choice of parallel library layers (Native MPI, MPICH, MPICH-G(2), LAM, WMPI, PACX and HPVM) • Various AMR schemes: Nested Boxes, GrACE, Coming: HLL, Chombo, Samrai, ??? • Parallel I/O (Panda, FlexIO, HDF5, etc…) • Parameter Parsing • Elliptic solvers (Petsc, Multigrid, SOR, etc…) • Visualization Tools, Remote steering tools, etc… • Globus (metacomputing/resource management) • Performance analysis tools (Autopilot, PAPI, etc…) • Remote visualization and steering • INSERT YOUR CS MODULE HERE... GrACE/DAGH PAPI Albert-Einstein-Institut www.aei-potsdam.mpg.de

  7. DLR Cactus Community Development Projects Astrophysics (Zeus) Numerical Relativity Community AEI Cactus Group (Allen) Cornell Crack prop. San Diego, GMD, Cornell EU Network (Seidel) Berkeley ChemEng (Bishop) Livermore NSF KDI (Suen) Geophysics (Bosl) NASA NS GC SDSS (Szalay) Clemson DFN Gigabit (Seidel) US Grid Forum NCSA, ANL, SDSC “Egrid” Applications “GRADS” (Kennedy,, et al) Microsoft Computational Science Intel Albert-Einstein-Institut www.aei-potsdam.mpg.de

  8. Some fun simulations... 3D Waves forming BH’s 3843, 100GB simulation, Largest production relativity 256 Processor Origin 2000 at NCSA Simulation, ~500GB output data Grid Future: Stream data, monitor, steer, distribute, farm tasks, etc 3D Colliding BH’s Albert-Einstein-Institut www.aei-potsdam.mpg.de

  9. Future view of Comp. Science: much of it here already... • Scale of computations much larger • Complexity approaching that of Nature • Simulations of the Universe and its constituents • Black holes, neutron stars, supernovae • Airflow around advanced planes, spacecraft • Human genome, human behavior • Teams of computational scientists working together • Must support efficient, high level problem description • Must support collaborative computational science • Must support all different languages • Ubiquitous Grid Computing • Very dynamic simulations, deciding their own future • Apps find the resources themselves: distributed, spawned, etc... • Must be tolerant of dynamic infrastructure (variable networks, processor availability, etc…) • Monitored, viz’ed, controlled from anywhere, with colleagues anywhere else... Albert-Einstein-Institut www.aei-potsdam.mpg.de

  10. Our Team Requires Grid Technologies, Big Machines for Big Runs Paris Hong Kong ZIB NCSA AEI WashU Thessaloniki • How Do We: • Maintain/develop Code? • Manage Computer Resources? • Carry Out/monitor Simulation? Albert-Einstein-Institut www.aei-potsdam.mpg.de

  11. What we need and want in simulation science: a higher level Portal to provide the following... • Got idea? Configuration manager: Write Cactus module, link to other modules, and… • Find resources • Where? NCSA, SDSC, Garching... • How many computers? Distribute Simulations? • Big jobs: “Fermilab” at disposal: must get it right while the beam is on! • Launch Simulation • How do get executable there? • How to store data? • What are local queue structure/OS idiosyncracies? • Monitor the simulation • Remote Visualization live while running • Limited bandwidth: compute viz. inline with simulation • High bandwidth: ship data to be visualized locally • Visualization server: all privileged users can login and check status/adjust if necessary • Are parameters screwed up? Very complex! • Call in an expert colleague…let her watch it too • Performance: how efficient is my simulation? Should something be adjusted? • Steer the simulation • Is memory running low? AMR! What to do? Refine selectively or acquire additional resources via Globus? Delete unnecessary grids? Performance steering... • Postprocessing and analysis • 1TByte output at NCSA, research groups in St. Louis and Berlin…how to deal with this? • Cactus Portal and VMR under development by Michael Russell, Jason Novotny, et al... Albert-Einstein-Institut www.aei-potsdam.mpg.de

  12. Grid-Enabled Cactus (static version) • Cactus and its ancestor codes have been using Grid infrastructure since 1993 (part of famous I-Way of SC’95) • Support for Grid computing was part of the design requirements • Cactus compiles “out-of-the-box” with Globus [using globus device of MPICH-G(2)] • Design of Cactus means that applications are unaware of the underlying machine/s that the simulation is running on … applications become trivially Grid-enabled • Infrastructure thorns (I/O, driver layers) can be enhanced to make most effective use of the underlying Grid architecture Albert-Einstein-Institut www.aei-potsdam.mpg.de

  13. Grid Computing Scenarios: Old StuffBut still not used by community: let’s fix this! • Simple: Sit here, compute there… • But very complex: users in my community still don’t like to do it! • Actually still very hard! • Portal is essential, and still very hard to get good implementation • Manage app configuration • Choose resources (Which one? How?) • Manage batch, files, results afterwards • Compute there, monitor and steer… • Visualization • Performance • Science/Engineering output improved… • Choose multiple sites in advance • Need more than any site could provide (simulate universe or human behavior…) • Need more than any site can provide NOW • Must wait a week to get 512 procs at NCSA, but could get 256 at NCSA and 256 at ANL now, even if it runs at 50% efficiency! Albert-Einstein-Institut www.aei-potsdam.mpg.de

  14. Remote Visualization and Steering OpenDX IsoSurfaces and Geodesics Computed inline with simulation Only geometry sent across network Any Viz Client • Changing any steerable parameter • Parameters • Physics, algorithms • Performance Remote Viz data HTTP HDF5 Arbitrary Grid Functions Streaming HDF5 Amira Remote Viz data Albert-Einstein-Institut www.aei-potsdam.mpg.de

  15. Viz Client (Amira) HDF5VFD DataGrid (Globus) HTTP DPSS FTP Web Server FTP Server DPSS Server Remote Offline Visualization Viz in Berlin VisualizationClient Downsampling, hyperslabs Only what is needed Remote Data Server 4TB distributed across NCSA/ANL/Garching Albert-Einstein-Institut www.aei-potsdam.mpg.de

  16. Dynamic Grid Computing Scenarios: New StuffMust make apps able to respond to dynamic, changing Grid environment... • Managing intelligent parameter surveys (Condor does this) • Distributing multiple grids across different machines (Climate) • Outsourcing: Spawning off independent jobs to new machines, e.g. analysis tasks • “Grid Vector”: master code “outsources” slave simulations at every timestep • “Grid Pipeline”: slave processes “outsource” tasks, which outsource… • Elliptic solve taking too long: stream matrix to Dongarra’s Netsolve for help... • Dynamic staging … seeking out and moving to faster/larger/cheaper machines as they become available • Scripting capabilities (management, launching new jobs, checking out new code, etc) • Dynamic load balancing (e.g. inhomogeneous loads, multiple grids), based on performance... • Etc…many new computing paradigms: preparing papers... Albert-Einstein-Institut www.aei-potsdam.mpg.de

  17. Application Code as Information Server/Gatherer • Code should be aware of its environment • What resources are out there? • What is their current state? • What is my allocation? • What is the bandwidth and latency between sites? • How can I adjust myself to take advantage of the current state? • Code should be able to make decisions on its own • A slow part of my simulation can run asychronously…spawn it off! • New, more powerful resources just became available…migrate there! • An unexpected event occurred: checkout, compile, and run new Cactus and stream data on newly discovered resource… • Python, perl scripting thorns driven by events…send email, ask for help, etc... • Etc... • Code should be able to publish this information to central server for tracking, monitoring, steering… • Will have entire hierarchies of related simulations…need to track everything... Albert-Einstein-Institut www.aei-potsdam.mpg.de

  18. Cactus Worm: Illustration of basic scenario • Cactus simulation starts, launched from a portal • Queries MDS, finds available resources • Migrates itself to next site • Uses some logic to choose next resource • Starts up remote simulation (passes proxy…) • Transfers memory contents to remote simulation (using streaming HDF5, scp, GASS, whatever…) • Registers new location to Cactus GRIS, terminates previous simulation • User tracks and monitors with continuous remote viz and control using thorn http, streaming data, etc...… • Continues around Europe, and so on… • Fun Grid game: Find and trap the Cactus Worm! • If we can do this, much of what we want can be done! • Want to build GADK…Grid App Dev Toolkit: bring users into the grid. Albert-Einstein-Institut www.aei-potsdam.mpg.de

  19. Mathematica Login from AEI Grand Picture Viz of data from previous simulations in SF café Remote steering and monitoring from airport Remote Viz in St Louis Remote Viz and steering from Berlin DataGrid/DPSS Downsampling IsoSurfaces http HDF5 T3E: Garching Origin: NCSA Globus Simulations launched from Cactus Portal Grid enabled Cactus runs on distributed machines Albert-Einstein-Institut www.aei-potsdam.mpg.de

  20. Further details... • Cactus • http://www.cactuscode.org • http://www.computer.org/computer/articles/einstein_1299_1.htm • Movies, research overview (needs major updating) • http://jean-luc.ncsa.uiuc.edu • Simulation Collaboratory/Portal Work: • http://wugrav.wustl.edu/ASC/mainFrame.html • Remote Steering, high speed networking • http://www.zib.de/Visual/projects/TIKSL/ • http://jean-luc.ncsa.uiuc.edu/Projects/Gigabit/ • EU Astrophysics Network • http://www.aei-potsdam.mpg.de/research/astro/eu_network/index.html Albert-Einstein-Institut www.aei-potsdam.mpg.de

More Related