slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Ben Moore Institute for Theoretical Physics, University of Zurich PowerPoint Presentation
Download Presentation
Ben Moore Institute for Theoretical Physics, University of Zurich

Loading in 2 Seconds...

play fullscreen
1 / 65

Ben Moore Institute for Theoretical Physics, University of Zurich - PowerPoint PPT Presentation

  • Uploaded on

Supercomputing: the past, present and future. Ben Moore Institute for Theoretical Physics, University of Zurich. + Oscar Agertz, Juerg Diemand, Tobias Kaufmann, Michela Mapelli, Joachim Stadel, Romain Teyssier & Marcel Zemp. Scaling of cpus, parallel computers and algorithms:.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Ben Moore Institute for Theoretical Physics, University of Zurich' - tariq

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Supercomputing: the past, present and future

Ben Moore

Institute for Theoretical Physics, University of Zurich

+ Oscar Agertz, Juerg Diemand, Tobias Kaufmann, Michela Mapelli, Joachim Stadel,

Romain Teyssier & Marcel Zemp


Scaling of cpus, parallel computers and algorithms:

  • Now here, not much increase in cpu speed or transistors, but higher number of operations per clock cycle

1957: IBM Vacuum tube, 5000 flops

2007: Intel Core 2 Duo, 10 billion flops

Mare Nostrum: ~100 teraflops

Seti@home is running at ~300 teraflops

Top500 #1, (2007) IBM BlueGene, 212,992 processors, 478 teraflops

Folding@home, Seti@home more than a petaflop (10^15 flops)

The first N-body calculations with N^2 algorithms in 1960 used 30 particles

Today we can now follow 10^10 particles and the resolution has increased ~ mean interparticle separation = 1000 times higher.

Moore’s law + N^2 would allow us to do ~10^6 particles, we are orders of magnitude above this thanks to algorithmic development.

Grid & treecodes are order (0)NLog(N)

Dehnen’s hierarchical mesh code is order (0)N

Stadel’s multipole expansion code PKDGRAV is also order (0)N and parallel.


The N-body technique

Solving these equations using finite difference techniques is impractical. It’s a 6-D equation… Analytic solutions are known mainly for a few equilibrium systems. Solve equations of motion of N-Bodies…


The equivalence of light and gravity was a beautiful idea

Here are Holmberg’s initial conditions.

(Hmmmm, initial conditions, that’s a whole separate talk.)


This experiment was recently repeated by John Dubinski – note how Holmberg cheated with the second ring of six particles!


Example 1: The secular and dynamical evolution of disks

A physically simple, but computationally demanding problem.

N-body simulations have long been used to study galactic dynamics. But only in the last few years has the resolution been sufficient to suppress numerical heating of disks.

Still, we are forced to construct “idealised models” using quasi-equilibrium spherical structures.

The disk heating problem has still not been properly solved…self consistently


A typical disk galaxy simulation with < 10^6 star particles.

Look at those smooth disks with 10^8 stars! Spiral patterns need perturbations!

(John Dubinski)


So, how do those CDM substructures perturb the disk?

These simulations are from John Dubinski who attempts to follow all the most massive perturbers.

See also Kazantzidis et al (2007).

CDM substructure provides a nice means to generate spiral patterns and thick disks


The Origin of Giant Low Surface Brightness Galaxies

Mapelli et al (2007)





Moore & Parker 2007

Malin1 has hardly had time to rotate once! How can stars form ~100kpc from the center?






Example 2: The non-linear formation of dark matter structures.

  • A physically simple, but computationally highly demanding problem.
  • First cold collapse simulations in the 1980’s – gave “Einasto” profiles.
  • CDM hierarchical collapses in the 1990’s – gave “Einasto” profiles.
  • There are lots of empirical “universal” correlations but we don’t understand their physical origin: density and phase space with radius relations, beta-gamma relation…
  • Observations of galaxies and dwarf spheroidals tell us about the mass distribution on ~100 parsec scales. Simulations have yet to reach that. Until today…
  • CDM hierarchical collapses in 2007 still smooth in the inner Galactic region. Do we have dark satellites at 8 kpc?
  • Indirect detection experiments need the phase space distribution of particles flowing throw this room. In fact, this is impossible to determine computationally. But we can make some bold extrapolations…

30 years ago….

N=700 (White 1976)

This overmerging problem was part of the motivation for White & Rees (1978) and the beginning of semi-analytic galaxy formation models.


Historically the first n-body simulations were carried out to study the collapse of a cloud of stars. (van Albada 1968)

With 100 stars these authors

found structures similar to elliptical galaxies.

However very cold collapses lead to a radial orbit instability and the system forms a prolate bar.

Numerical simulations show that k.e./p.e.>0.2 in order to suppress this instability.


The origin of universal density profiles, universal phase space density profiles, universal velocity distribution functions, universal correlations between anisotropy and local density slope – are unknown.

T=10 crossing times


increasing resolution doesnt just give you more of the same you find a new physical understanding
Increasing resolution doesnt just give you more of the same – you find a new Physical understanding

Cluster Resolved



Galaxy Halos




  • It took Joachim 3 years to write the first version of the mpi treecode PKDGRAV
  • It took six months to do this simulation on the 128 processor KSR
  • It finished in 1996 and we stared at it for a year before developing the tools to analyse it (Ghigna etal 1998).

The first structures in the universe

Diemand etal (2005) earth mass, parsec scale

Each particle has the mass of the moon…higher resolution than groups simulating the origin of the solar system.


Diemand etal (2005)

This is the one of the first objects to form in the Universe at z=60. The halo is smooth, has a cuspy density profile, has an earth mass 10^-6Mo and a size of the solar system.

Can be detected as high proper motion gamma-ray sources (Moore et al 2005).


Stable for a long long time, but no structure is stable forever… see talk of Krauss

Agreement with galaxy properties/abundance/radial distribution in clusters is very strong evidence for a hierarchical universe


Confrontation of rotation curves with predictions from numerical simulations

Swaters, Madore, van den Bosch & Balcells (2002)

DeBlok, McGaugh, Rubin & Bosma (2002)

Bars heat core? Not all galaxies are barred i.e. M33. Bars are long lived.

Triaxility? CDM haloes are round, should observe 50% galaxies with steeper cusps.


Goerdt et al 2006

  • Dynamical friction fails in constant density systems
  • Reason is that dm/stars have the same orbital frequency – all particles that can scatter off the sinking mass do so immediately
  • Consequence is rapid sinking and then stalling
  • Even if halo is slightly cuspy, will result in a core and stalling

Example 3: The internal kinematics of the ISM

A physically complex problem that has remained unsolved for decades.

What determines the kinematics of the ISM and regulates star-formation? Supernovae? Large scale gravity modes? Molecular Cloud – Molecular Cloud encounters?


Dibb & Burkert: “Gravity alone can’t be responsible for the floor in observed ISM dispersions”

“Need energy input from supernovae”

Previous unresolved ISM simulations



Oscar Agertz et al (2007)








ISM kinematics using the RAMSES AMR code +/- star formation +/- SN feedback


Agertz et al ( 2008)

“The observed dispersion velocity is generated through molecular cloud formation and subsequent gravitational encounters”


Scaling of cpus, parallel computers and algorithms:

Given the exponential scaling of supercompters, it is difficult for special purpose hardware to keep up at a reasonable cost.

Also want a general purpose code that can scale to 10000’s of processors.


Scaling of cpus, parallel computers and algorithms:



It’s a tough commercial business!

Clusters started to appear ~2000, but you couldn’t buy a system that wouldn’t melt down. So we built our own….


zBox2 (2006) – 5 Tflops

500 quad Opteron 852’s, 580Gb memory, 65Tb disk, 3d-SCI low latency network.

zBox1: (Stadel & Moore 2002)

288 AMD MP2200+ processors, 144 Gigs ram

Compact, easy to cool and maintain

Very fast Dolphin/SCI interconnects - 4 Gbit/s, microsecond latency

A teraflop supercomputer for $500,000

Roughly one cubic meter, one ton and requires 40 kilowatts of power

2002 0.5 Tflops

2007 6 Tflops, 200Tb disk

With fixed timesteps most codes all scale very well.

However, this is no-longer the only measure since the scaling of a very "deep" multistepping run is always a lot worse.

How do we do multistepping now and why does it still have problems?

choice of timestep
Choice of Timestep
  • Want a criterion that commutes with the Kick operator and is Galilean invariant, so it should not depend on velocities.


Non-local, based on max acceleration in moderate densities

and can take the minimum of any or all of these criteria


Zemp et al (2007)

Here we show the timesteps versus radius, for a sperical NFW halo

S = 0.2

D = 0.03

Dynamical Time

Wasted CPU time


Innacurate timesteps


This two body mildly eccentric orbit can’t be followed with the standard leap-frog integrator. The orbits shrink and precess.


But look at this highly eccentric, unequal mass orbit of a star around a black hole using the error correction:

(Note the axis are not the same scale)

Marcel Zemp: a collisionless and collisional MPI parallel code.

Can simulate central BH kinematics within a CDM halo for example. Or the core collapse of a GC.

multistepping the real parallel computing challenge
Multistepping: The real parallel computing challenge.
  • T ~ 1/sqrt(Gρ), even more dramatic in SPH
  • Implies Nactive<< N
  • Global approach to load balancing fails.
  • Less compute/comm
  • Too many synchronization points between all processors.

Want all algorithms of the simulation code to scale as O(Nactive log N)!

Everything that isn't introduces a fixed cost which limits the speed-up attainable from multistepping


The Future

In seven years our laptops will have a teraflop of processing power, equivalent to our first zBox1 supercomputer!

What is the performance of the ultimate laptop?


Information theory


Quantum mechanics



Seth Loyd, Nature (2000)


Extrapolating the exponential Moore’s law into the future we find that it will take just 250 years to achieve those 40 orders of magnitude gain.


Extrapolating the exponential Moore’s law into the future we find that it will take just 250 years to achieve those 40 orders of magnitude gain.

…and you still won’t be able to simulate a CDM halo with the correct number of particles!