Supercomputing: the past, present and future Ben Moore Institute for Theoretical Physics, University of Zurich + Oscar Agertz, Juerg Diemand, Tobias Kaufmann, Michela Mapelli, Joachim Stadel, Romain Teyssier & Marcel Zemp
Scaling of cpus, parallel computers and algorithms: • Now here, not much increase in cpu speed or transistors, but higher number of operations per clock cycle
1957: IBM Vacuum tube, 5000 flops 2007: Intel Core 2 Duo, 10 billion flops Mare Nostrum: ~100 teraflops Seti@home is running at ~300 teraflops Top500 #1, (2007) IBM BlueGene, 212,992 processors, 478 teraflops Folding@home, Seti@home more than a petaflop (10^15 flops) The first N-body calculations with N^2 algorithms in 1960 used 30 particles Today we can now follow 10^10 particles and the resolution has increased ~ mean interparticle separation = 1000 times higher. Moore’s law + N^2 would allow us to do ~10^6 particles, we are orders of magnitude above this thanks to algorithmic development. Grid & treecodes are order (0)NLog(N) Dehnen’s hierarchical mesh code is order (0)N Stadel’s multipole expansion code PKDGRAV is also order (0)N and parallel.
The N-body technique Solving these equations using finite difference techniques is impractical. It’s a 6-D equation… Analytic solutions are known mainly for a few equilibrium systems. Solve equations of motion of N-Bodies…
The equivalence of light and gravity was a beautiful idea Here are Holmberg’s initial conditions. (Hmmmm, initial conditions, that’s a whole separate talk.)
This experiment was recently repeated by John Dubinski – note how Holmberg cheated with the second ring of six particles!
Example 1: The secular and dynamical evolution of disks A physically simple, but computationally demanding problem. N-body simulations have long been used to study galactic dynamics. But only in the last few years has the resolution been sufficient to suppress numerical heating of disks. Still, we are forced to construct “idealised models” using quasi-equilibrium spherical structures. The disk heating problem has still not been properly solved…self consistently
A typical disk galaxy simulation with < 10^6 star particles. Look at those smooth disks with 10^8 stars! Spiral patterns need perturbations! (John Dubinski)
So, how do those CDM substructures perturb the disk? These simulations are from John Dubinski who attempts to follow all the most massive perturbers. See also Kazantzidis et al (2007). CDM substructure provides a nice means to generate spiral patterns and thick disks
The Origin of Giant Low Surface Brightness Galaxies Mapelli et al (2007) Malin1 Malin2 UGC6614
Moore & Parker 2007 Malin1 has hardly had time to rotate once! How can stars form ~100kpc from the center?
T=100Myr=Cartwheel 200kpc T=1Gyr=Malin1
Example 2: The non-linear formation of dark matter structures. • A physically simple, but computationally highly demanding problem. • First cold collapse simulations in the 1980’s – gave “Einasto” profiles. • CDM hierarchical collapses in the 1990’s – gave “Einasto” profiles. • There are lots of empirical “universal” correlations but we don’t understand their physical origin: density and phase space with radius relations, beta-gamma relation… • Observations of galaxies and dwarf spheroidals tell us about the mass distribution on ~100 parsec scales. Simulations have yet to reach that. Until today… • CDM hierarchical collapses in 2007 still smooth in the inner Galactic region. Do we have dark satellites at 8 kpc? • Indirect detection experiments need the phase space distribution of particles flowing throw this room. In fact, this is impossible to determine computationally. But we can make some bold extrapolations…
30 years ago…. N=700 (White 1976) This overmerging problem was part of the motivation for White & Rees (1978) and the beginning of semi-analytic galaxy formation models.
Historically the first n-body simulations were carried out to study the collapse of a cloud of stars. (van Albada 1968) With 100 stars these authors found structures similar to elliptical galaxies. However very cold collapses lead to a radial orbit instability and the system forms a prolate bar. Numerical simulations show that k.e./p.e.>0.2 in order to suppress this instability.
The origin of universal density profiles, universal phase space density profiles, universal velocity distribution functions, universal correlations between anisotropy and local density slope – are unknown. T=10 crossing times T=0
Increasing resolution doesnt just give you more of the same – you find a new Physical understanding Cluster Resolved (1980-1997) 67,500 Galaxy Halos Resoved (1998) 1,300,000 • It took Joachim 3 years to write the first version of the mpi treecode PKDGRAV • It took six months to do this simulation on the 128 processor KSR • It finished in 1996 and we stared at it for a year before developing the tools to analyse it (Ghigna etal 1998).
The first structures in the universe Diemand etal (2005) earth mass, parsec scale Each particle has the mass of the moon…higher resolution than groups simulating the origin of the solar system.
Diemand etal (2005) This is the one of the first objects to form in the Universe at z=60. The halo is smooth, has a cuspy density profile, has an earth mass 10^-6Mo and a size of the solar system. Can be detected as high proper motion gamma-ray sources (Moore et al 2005).
Stable for a long long time, but no structure is stable forever… see talk of Krauss Agreement with galaxy properties/abundance/radial distribution in clusters is very strong evidence for a hierarchical universe
Confrontation of rotation curves with predictions from numerical simulations Swaters, Madore, van den Bosch & Balcells (2002) DeBlok, McGaugh, Rubin & Bosma (2002) Bars heat core? Not all galaxies are barred i.e. M33. Bars are long lived. Triaxility? CDM haloes are round, should observe 50% galaxies with steeper cusps.
Goerdt et al 2006 • Dynamical friction fails in constant density systems • Reason is that dm/stars have the same orbital frequency – all particles that can scatter off the sinking mass do so immediately • Consequence is rapid sinking and then stalling • Even if halo is slightly cuspy, will result in a core and stalling
Example 3: The internal kinematics of the ISM A physically complex problem that has remained unsolved for decades. What determines the kinematics of the ISM and regulates star-formation? Supernovae? Large scale gravity modes? Molecular Cloud – Molecular Cloud encounters?
Dibb & Burkert: “Gravity alone can’t be responsible for the floor in observed ISM dispersions” “Need energy input from supernovae” Previous unresolved ISM simulations
AMR Oscar Agertz et al (2007) SPH
Agertz Mayer Lake Moore Teyssier ISM kinematics using the RAMSES AMR code +/- star formation +/- SN feedback
Agertz et al ( 2008) “The observed dispersion velocity is generated through molecular cloud formation and subsequent gravitational encounters”
Scaling of cpus, parallel computers and algorithms: Given the exponential scaling of supercompters, it is difficult for special purpose hardware to keep up at a reasonable cost. Also want a general purpose code that can scale to 10000’s of processors.
Scaling of cpus, parallel computers and algorithms: ksr zBox1 It’s a tough commercial business! Clusters started to appear ~2000, but you couldn’t buy a system that wouldn’t melt down. So we built our own….
zBox2 (2006) – 5 Tflops 500 quad Opteron 852’s, 580Gb memory, 65Tb disk, 3d-SCI low latency network. zBox1: (Stadel & Moore 2002) 288 AMD MP2200+ processors, 144 Gigs ram Compact, easy to cool and maintain Very fast Dolphin/SCI interconnects - 4 Gbit/s, microsecond latency A teraflop supercomputer for $500,000 Roughly one cubic meter, one ton and requires 40 kilowatts of power 2002 0.5 Tflops 2007 6 Tflops, 200Tb disk
With fixed timesteps most codes all scale very well. However, this is no-longer the only measure since the scaling of a very "deep" multistepping run is always a lot worse. How do we do multistepping now and why does it still have problems?
Choice of Timestep • Want a criterion that commutes with the Kick operator and is Galilean invariant, so it should not depend on velocities. Local Non-local, based on max acceleration in moderate densities and can take the minimum of any or all of these criteria
Zemp et al (2007) Here we show the timesteps versus radius, for a sperical NFW halo S = 0.2 D = 0.03 Dynamical Time Wasted CPU time Standard Innacurate timesteps