Timestepping and Parallel Computing in Highly Dynamic N-body Systems. Joachim Stadel email@example.com University of Zürich Institute for Theoretical Physics. Astrophysical N-body Simulations. Physics. Hydro. Gravity. Collisions. Near Integrable. 12. 11. 10. 9. 8. 7. 6. 5.
Timestepping and Parallel Computing in Highly Dynamic N-body Systems
University of Zürich
Institute for Theoretical Physics
Solar System Formation
The initial conditions for structure formation.
The Universe is completely smooth to one part in 1,000 at z=1000.
WMAP Satellite 2003
Fluctuations in the Microwave Background Radiation
At z=0 and on the very largest scales the distribution of galaxies is in fact homogeneous.
Greenbank radio galaxy survey (1990)
On ´smaller´ scales: redshift surveys
From the microwave background fluctuations to the present day structure seen in galaxy redshift surveys.
Typically Nsimulation << Nreal so
the equation below is NOT the
one we should be solving.
∂ƒ/∂t + [ƒ,H] = 0; ƒdz = 1
the Collisionless Boltzman Equation (CBE)
CBE is 1st order non-linear PDE. These can be solved by the method of characteristics. The characteristics are the path along which information propagates; for CBE defined by:
dx/dt = v ; dv/dt = -Φ
But these are the equations of motion we had above! ƒ is constant along the characterisics, thus each particle carries a piece of ƒ in its trajectory.
In terms of the distribution function,
Φ(x) = -GM∫dz´ƒ(z´)/|x-x´|
Monte Carlo: for any reasonable function g(z),
∫dz g(z) = lim 1/N ∑ g(zi)/ƒs(zi)
zi are randomly chosen with sampling probability density ƒs
Apply this to the Poisson Integral
Φ(x) ≈ -GM/N ∑ƒ(zi)/ƒs(zi) 1/|x-x´|
So in a conventional N-body simulation ƒs(z) = ƒ(z), so the particle
density represents the underlying phase space density.
The singularity at x = x´ in the Poisson integral causes very large scatter in the estimation of Φ.
This results in a fluctuation in the potential, δΦ, which has 2 effects.
1. Change in the particle‘s energy along its orbit:
dE/dt = xi ∂E/∂xi + vi ∂E/∂vi + ∂E/∂t = ∂Φ/∂t
Fluctuations in Φ due to discrere sampling will cause a random walk in enegy for the particle: this is two-body relaxation.
2. Mass segregation: if more and less massive particles are present, the less massive ones will typically recoil from an encounter with more velocity than a massive particle.
Softening, either explicitly introduced or as part of the numerical method, lessens these effects.
This is even more important for cosmological simulations where all structures formed from smaller initial objects.
All particles experienced a large relative degree of relaxation in the past.
Diemand and Moore 2002
Dwarf Galaxy Halos
zBox: (Stadel & Moore) 2002
288 AMD MP2200+ processors, 144 Gigs ram, 10 Terabyte disk
Compact, easy to cool and maintain
Very fast Dolphin/SCI interconnects - 4 Gbit/s, microsecond latency
A teraflop computer for $500,000 ($250,000 with MBit)
Roughly one cubic meter, one ton and requires 40kilowatts of power
500 CPUs/640 GB RAM
~100 TB of Disk
A parallel computer is currently still mostly wiring.
The human brain (Gary Kasparov) is no exception.
However, wireless CPUs are now under development
which will revolutionize parallel computer construction.
spatial binary with squeeze
Forces are calculated using a 4th order multipoles.
Ewald summation technique used to introduce periodic boundary conditions (also based on a 4th order expansion).
Work is tracked and fed back into domain decomp.
Compute time vs. Accuracy
(1/r^2) This means it is benificial to divide space in order to achieve load balance. Minimizes communication with other processors.
Example division of space for 8 processors
Other decomposition strategies...
Low latency message passing
Local cache of remote data elements
PKDGRAV does not attempt to determine in advance which data elements are going to be required in a step (LET).
The hit rate in the cache is very good with as little as 10 MB.
On the T3E it was possible to obtain 80% of linear scaling on 512 processors.
GASOLINE: Wadsley, Stadel & Quinn NewA 2003
SPH is very well matched to a particle based gravity code like PKDGRAV since all the core data structures and many of the same algorithms can be used. For example, the neighbor searching can simply use the parallel distrinuted tree structure.
Hernquist & Katz 89
Fairly standard SPH formulation is used in GASOLINE
Evrard 88, Benz 89
The Large Magellanic Cloud (LMC) in gas and stars
With fully dynamical Milky Way Halo (dark matter and hot gas and stellar disk and bulge) which are not shown here.
Both tidal and ram-pressure stripping of gas is taking place.
(University of Zürich)
Derek C. Richardson
Gravity with hard spheres including surface friction, coefficient of restitution and aggregates; the Euler equations for solid bodies.
Part of an asteroid disk, where the outcomes of the asteroid impact simulations are included.
The power spectrum of density fluctuations in three different dark matter models
Large scales (galaxy clusters)
Small scales (dwarf galaxies)
Andrea Maccio et al
Andrea Maccio et al
Andrea Maccio et al
1kev WDM ~10 satellites
CDM ~500 satellites
Very strong constraint on the lowest mass WDM candidate – need to form at least one Draco sized substructure halo
Halo density profiles unchanged – Liouvilles constraint gives cores ~< 50pc
How do we do multistepping now and why does it have problems?
Drift-Kick-Drift Multistepping Leapfrog
Note that none of the Kick tick marks align, meaning that gravity is calculated for a single rung at a time, despite the fact that the tree is built for all particles.
The select operators are performed top-down until all particles end up on appropriate timestep rungs. 0:DSKD, 1:DS(DSKDDSKD)D, 2:DS(DS(DSKD...
Kick-Drift-Kick Multistepping Leapfrog
This method is more efficient since it performs half the number of tree build operations.
It also exhibits somewhat lower errors than the standard DKD integrator
It is the only scheme used in production at present.
Non-local, based on max acceleration in moderate densities
and can take the minimum of any or all of these criteria
Want all algorithms of the simulation code to scale as O(Nactive log N)!
Everything that isn't introduces a fixed cost which limits the speed-up attainable from multistepping
Tree repair instead of rebuild.
Use O(N^2) for very active regions.
Hybrid Methods: Block+Symba
Don't drift all particles, only drift terms that appear on the interaction list!
Do smart updates of local cache information instead of flushing at each timestep.
Use some local form of achieving load balancing, perhaps scheduling? Remote walks?
Allow different parts of the simulation to get somewhat out-of-sync?