Timestepping and parallel computing in highly dynamic n body systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 46

Timestepping and Parallel Computing in Highly Dynamic N-body Systems PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

Timestepping and Parallel Computing in Highly Dynamic N-body Systems. Joachim Stadel [email protected] University of Zürich Institute for Theoretical Physics. Astrophysical N-body Simulations. Physics. Hydro. Gravity. Collisions. Near Integrable. 12. 11. 10. 9. 8. 7. 6. 5.

Download Presentation

Timestepping and Parallel Computing in Highly Dynamic N-body Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Timestepping and parallel computing in highly dynamic n body systems

Timestepping and Parallel Computing in Highly Dynamic N-body Systems

Joachim Stadel

[email protected]

University of Zürich

Institute for Theoretical Physics


Astrophysical n body simulations

Astrophysical N-body Simulations

Physics

Hydro

Gravity

Collisions

Near Integrable

12

11

10

9

8

7

6

5

4

3

2

10

10

10

10

10

10

10

10

10

10

10

10

1

2

4

6

8

10

12

13

15

17

1

10

10

10

10

10

10

10

10

10

Apps

LSS-Surveys

Solar System Formation

SS-Stability

Galaxy Formation


Outline

Outline

  • Multistepping Part 2

  • Initial Conditions – Shells

  • Blackhole "Mergers"

  • Fast Multipole Method

  • PKDGRAV2

  • Cosmo Initial Conditions

  • GHALO Simulation

  • GHALO Prelim. Results

    • Density Profile

    • Phase-Space Density

    • Subhalos & Reionization

  • What next?

  • Collisionless Simulations and Resolution

  • Parallel Computers

  • Tree Codes – Tree Codes on Parallel Computers

  • PKDGRAV (and Gasoline)

  • Applications – Various Movies

  • Warm Dark Matter

  • Multistepping Part 1

  • New Parallelization Problems


Timestepping and parallel computing in highly dynamic n body systems

The initial conditions for structure formation.

The Universe is completely smooth to one part in 1,000 at z=1000.

WMAP Satellite 2003

Fluctuations in the Microwave Background Radiation


Timestepping and parallel computing in highly dynamic n body systems

At z=0 and on the very largest scales the distribution of galaxies is in fact homogeneous.

Greenbank radio galaxy survey (1990)

31,000 galaxies


Timestepping and parallel computing in highly dynamic n body systems

On ´smaller´ scales: redshift surveys


Numerical simulation

Numerical Simulation

From the microwave background fluctuations to the present day structure seen in galaxy redshift surveys.


N body simulations as models of stellar systems

N-body simulations as models of stellar systems

Typically Nsimulation << Nreal so

the equation below is NOT the

one we should be solving.

∂ƒ/∂t + [ƒ,H] = 0; ƒdz = 1

the Collisionless Boltzman Equation (CBE)

N

xi=∑-Φ(xi,xj)

¨

j≠i

CBE is 1st order non-linear PDE. These can be solved by the method of characteristics. The characteristics are the path along which information propagates; for CBE defined by:

N

dx/dt = v ; dv/dt = -Φ

But these are the equations of motion we had above! ƒ is constant along the characterisics, thus each particle carries a piece of ƒ in its trajectory.


Only difficulty is in evaluating

only difficulty is in evaluating Φ

In terms of the distribution function,

Φ(x) = -GM∫dz´ƒ(z´)/|x-x´|

N

Monte Carlo: for any reasonable function g(z),

∫dz g(z) = lim 1/N ∑ g(zi)/ƒs(zi)

N∞

i=1

zi are randomly chosen with sampling probability density ƒs

N

Apply this to the Poisson Integral

Φ(x) ≈ -GM/N ∑ƒ(zi)/ƒs(zi) 1/|x-x´|

i=1

So in a conventional N-body simulation ƒs(z) = ƒ(z), so the particle

density represents the underlying phase space density.


Softening

Softening

The singularity at x = x´ in the Poisson integral causes very large scatter in the estimation of Φ.

This results in a fluctuation in the potential, δΦ, which has 2 effects.

1. Change in the particle‘s energy along its orbit:

dE/dt = xi ∂E/∂xi + vi ∂E/∂vi + ∂E/∂t = ∂Φ/∂t

Fluctuations in Φ due to discrere sampling will cause a random walk in enegy for the particle: this is two-body relaxation.

2. Mass segregation: if more and less massive particles are present, the less massive ones will typically recoil from an encounter with more velocity than a massive particle.

Softening, either explicitly introduced or as part of the numerical method, lessens these effects.


All n body simulations of the cbe suffer from 2 body relaxation

All N-body simulations of the CBE suffer from 2-body relaxation!

This is even more important for cosmological simulations where all structures formed from smaller initial objects.

All particles experienced a large relative degree of relaxation in the past.

Diemand and Moore 2002


Increasing resolution

Increasing Resolution

Cluster Resolved

67,500

Galaxy Halos

Resoved

1,300,000

Dwarf Galaxy Halos

Resolved

10,500,000


Timestepping and parallel computing in highly dynamic n body systems

zBox: (Stadel & Moore) 2002

288 AMD MP2200+ processors, 144 Gigs ram, 10 Terabyte disk

Compact, easy to cool and maintain

Very fast Dolphin/SCI interconnects - 4 Gbit/s, microsecond latency

A teraflop computer for $500,000 ($250,000 with MBit)

Roughly one cubic meter, one ton and requires 40kilowatts of power


Timestepping and parallel computing in highly dynamic n body systems

Parallel supercomputing


Timestepping and parallel computing in highly dynamic n body systems

500 CPUs/640 GB RAM

~100 TB of Disk

A parallel computer is currently still mostly wiring.

The human brain (Gary Kasparov) is no exception.

However, wireless CPUs are now under development

which will revolutionize parallel computer construction.


Spatial binary tree

Spatial Binary Tree

k-D Tree

spatial binary with squeeze


Timestepping and parallel computing in highly dynamic n body systems

Forces are calculated using a 4th order multipoles.

Ewald summation technique used to introduce periodic boundary conditions (also based on a 4th order expansion).

Work is tracked and fed back into domain decomp.


Timestepping and parallel computing in highly dynamic n body systems

Compute time vs. Accuracy


Parallelizing gravity pkdgrav

Parallelizing Gravity (PKDGRAV)

  • Spatial Locality = Computational Locality

    (1/r^2) This means it is benificial to divide space in order to achieve load balance. Minimizes communication with other processors.

  • But... add constraint on the number of particles/processor, Memory is limitted!

  • Domain Decomposition is a global optimization of these requirements which is solved dynamically with every step.

Example division of space for 8 processors


Timestepping and parallel computing in highly dynamic n body systems

Other decomposition strategies...


How are non local parts of the tree walked by pkdgrav

How are non-local parts of the tree walked by PKDGRAV?

Low latency message passing

Local cache of remote data elements

CPU i

CPU j

PKDGRAV does not attempt to determine in advance which data elements are going to be required in a step (LET).

The hit rate in the cache is very good with as little as 10 MB.


Timestepping and parallel computing in highly dynamic n body systems

PKDGRAV

Scaling

On the T3E it was possible to obtain 80% of linear scaling on 512 processors.

PKDGRAV

Joachim Stadel

Thomas Quinn


Timestepping and parallel computing in highly dynamic n body systems

GASOLINE: Wadsley, Stadel & Quinn NewA 2003

SPH is very well matched to a particle based gravity code like PKDGRAV since all the core data structures and many of the same algorithms can be used. For example, the neighbor searching can simply use the parallel distrinuted tree structure.

Hernquist & Katz 89

Fairly standard SPH formulation is used in GASOLINE

Monaghan 92

Evrard 88, Benz 89


Algorithms within gasoline

Algorithms within GASOLINE

  • We perform 2 NN operations

  • Find 32 NN and calculate densities.

  • Calculate forces in a second pass.

  • For active particles we do a gather on the k-NN, and a scatter from the k-Inverse NN. We never store the nearest neighbors. (Springel 2001 similar)

  • Cooling and Heating and Ionization quite efficient.


Timestepping and parallel computing in highly dynamic n body systems

The Large Magellanic Cloud (LMC) in gas and stars

With fully dynamical Milky Way Halo (dark matter and hot gas and stellar disk and bulge) which are not shown here.

Both tidal and ram-pressure stripping of gas is taking place.

Chiara Mastropietro

(University of Zürich)


Timestepping and parallel computing in highly dynamic n body systems

Collisional Physics

Derek C. Richardson

Gravity with hard spheres including surface friction, coefficient of restitution and aggregates; the Euler equations for solid bodies.


Timestepping and parallel computing in highly dynamic n body systems

Asteroid Collisions


Timestepping and parallel computing in highly dynamic n body systems

Part of an asteroid disk, where the outcomes of the asteroid impact simulations are included.


Movies of 1000 years of evolution

Movies of 1000 years of evolution.


Timestepping and parallel computing in highly dynamic n body systems

The power spectrum of density fluctuations in three different dark matter models

Large scales (galaxy clusters)

Small scales (dwarf galaxies)

CMB

Horizon scale


Timestepping and parallel computing in highly dynamic n body systems

CDM T=GeV

40Mpc

N=10^7

Andrea Maccio et al


Timestepping and parallel computing in highly dynamic n body systems

WDM T=2keV

40Mpc

N=10^7

Andrea Maccio et al


Timestepping and parallel computing in highly dynamic n body systems

WDM T=0.5keV

40Mpc

N=10^7

Andrea Maccio et al


Timestepping and parallel computing in highly dynamic n body systems

1kev WDM ~10 satellites

CDM ~500 satellites

Very strong constraint on the lowest mass WDM candidate – need to form at least one Draco sized substructure halo

Halo density profiles unchanged – Liouvilles constraint gives cores ~< 50pc


Timestepping and parallel computing in highly dynamic n body systems

CDM n(M)=M^-2

WDM n(M)=M^-1

Data n(L)=L^-1


Timestepping and parallel computing in highly dynamic n body systems

  • With fixed timesteps these codes all scale very well.

  • However, this is no-longer the only measure since the scaling of a very "deep" multistepping run can be a lot worse.

    How do we do multistepping now and why does it have problems?


Timestepping and parallel computing in highly dynamic n body systems

Drift-Kick-Drift Multistepping Leapfrog

Select

Rung 0

Kick

Drift

Rung 1

Select

Select

Rung 2

time

Note that none of the Kick tick marks align, meaning that gravity is calculated for a single rung at a time, despite the fact that the tree is built for all particles.

The select operators are performed top-down until all particles end up on appropriate timestep rungs. 0:DSKD, 1:DS(DSKDDSKD)D, 2:DS(DS(DSKD...


Timestepping and parallel computing in highly dynamic n body systems

Kick-Drift-Kick Multistepping Leapfrog

Select

Select

Select

Select

This method is more efficient since it performs half the number of tree build operations.

It also exhibits somewhat lower errors than the standard DKD integrator

It is the only scheme used in production at present.


Choice of timestep

Choice of Timestep

  • Want a criterion which commutes with the Kick operator and is Galilean invariant, so it should not depend on velocities.

Local

Non-local, based on max acceleration in moderate densities

and can take the minimum of any or all of these criteria


Multistepping the real parallel computing challenge

Multistepping: The real parallel computing challenge.

  • T ~ 1/sqrt(Gρ), even more dramatic in SPH

  • Implies Nactive<< N

  • Global approach to load balancing fails.

  • Less compute/comm

  • Too many synchronization points between all processors.

Want all algorithms of the simulation code to scale as O(Nactive log N)!

Everything that isn't introduces a fixed cost which limits the speed-up attainable from multistepping


The trends

The Trends

  • Parallel computers are getting ever more independent computing elements. Eg: Bluegene (100'000s), Multicore CPUs

  • Our simulations are always increasing in resolution and hence we need many more timesteps than were required in the past.

  • Multistepping methods have ever more potential to speed up calculations, but introduce new complexities into codes, particularly for large parallel machines.


What can be done

What can be done?

Tree repair instead of rebuild.

Use O(N^2) for very active regions.

Hybrid Methods: Block+Symba

Don't drift all particles, only drift terms that appear on the interaction list!

Do smart updates of local cache information instead of flushing at each timestep.

Use some local form of achieving load balancing, perhaps scheduling? Remote walks?

Allow different parts of the simulation to get somewhat out-of-sync?


  • Login