1 / 7

Gravitational N-body Simulation

Gravitational N-body Simulation. Major Design Goals Efficiency Versatility (ability to use different numerical methods) Scalability Lesser Design Goals Flexibility (control parameters must be configurable) Persistence (pause and continue) Visualization. Hardware.

pierce
Download Presentation

Gravitational N-body Simulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gravitational N-body Simulation Major Design Goals Efficiency Versatility (ability to use different numerical methods) Scalability Lesser Design Goals Flexibility (control parameters must be configurable) Persistence (pause and continue) Visualization

  2. Hardware 6 GFlops average desktop 256 GFlops top-line server Single Computer Configuration 1-4 CPUs 1-4 Cores 3-4 GHz CPUs 2-4 32-bit FP IPC 1-2 64-bit FP IPC Windows Cluster Configurations http://gears.aset.psu.edu/hpc/systems/ -LION-XO (80x2xOpteron/8GB + 40x4xOpteron/16GB; 2.4 GHz) -1.6 TFlops (32-bit); 800 GFlops (64-bit); single-core assumed -Gigabit Ethernet GNU/Linux Single or dual core CPUs? CPU Model?

  3. Algorithms Direct Methods: O(N2) + very simple + scalable inefficient (~30,000 particles max @ 256 GFlops) Treecode / Mutipole: O(NlogN) more difficult to implement scalability harder to achieve + efficient (106-1010 particles) Field Methods: O(NlogN) or O(N) Involves solving Poisson’s equation Area of active research

  4. Levels of Parallelization 1) SIMD: up to 4 threads -4x32-bit flops/cycle -2x64-bit flops/cycle 2) SMP/MPU: up to 4 threads -1-4 cores -1-4 CPUs 3) Cluster: up to N nodes

  5. Memory Requirements Position: x, y, z Velocity: vx, vy, vz 6x4 = 24 bytes (32-bit fp) 6x8 = 48 bytes (64-bit fp) 2,500 points per KB (32-bit) 1,300 points per KB (64-bit)

  6. Levels of Memory 1) L1 cache: 64 KB -CPU clock-speed -no latency 2) L2 cache: 1 MB -CPU clock-speed -low latency 3) RAM: GBs -reduced speed (up to 12-24GB/s) -huge latency 4) Network (weakest link) -1 Gbit/sec

  7. 109 Particles Require… Memory: 24 GB (32-bit) Instructions per iteration: Log2(109)x109xconst~3x1012ops=3TFlops Time: ~12 sec @ 256 GFlops

More Related