accelerating an n body simulation
Download
Skip this Video
Download Presentation
Accelerating an N-Body Simulation

Loading in 2 Seconds...

play fullscreen
1 / 12

Accelerating an N-Body Simulation - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Accelerating an N-Body Simulation. Anuj Kalia Maxeler Technologies. CPU loads particle data into DRAM for every iteration. (every N*N cycles). CPU loads particle data into DRAM for every iteration. (every N*N cycles) A new set of 4 values is read from DRAM in every cycle.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Accelerating an N-Body Simulation' - ayame


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
accelerating an n body simulation

Accelerating an N-Body Simulation

Anuj Kalia

Maxeler Technologies

slide3
CPU loads particle data into DRAM for every iteration. (every N*N cycles)

A new set of 4 values is read from DRAM in every cycle.

slide4
CPU loads particle data into DRAM for every iteration. (every N*N cycles)

A new set of 4 values is read from DRAM in every cycle.

16 force computations are done based on 16 scalar inputs and the 4 values read earlier.

The pipeline and accumulator are described in another slide.

slide5
CPU loads particle data into DRAM for every iteration. (every N*N cycles)

A new set of 4 values is read from DRAM in every cycle.

16 force computations are done based on 16 scalar inputs and the 4 values read earlier.

The pipeline and accumulator are described in another slide.

Every pipeline outputs 12 partial sums after ‘N’ cycles.

slide6
CPU loads particle data into DRAM for every iteration. (every N*N cycles)

A new set of 4 values is read from DRAM in every cycle.

16 force computations are done based on 16 scalar inputs and the 4 values read earlier.

The pipeline and accumulator are described in another slide.

Every pipeline outputs 12 partial sums after ‘N’ cycles.

CPU adds the 12 partial sums together (for every particle), updates velocities, updates positions and re-writes into the DRAM.

slide8
Pipeline and Accumulator:

1 Input per cycle: P_j data from DRAM.

Acceleration:

accumulated as 12 partial sums.

resource usage
Resource Usage

Resource Usage for 16 fold parallel kernel @ 150MHz:

LUTs: 156032 / 297600 (52.43%)

FFs: 166543 / 595200 (27.98%)

BRAMs: 433 / 1064 (40.70%)

288 / 2016 (14.29%)

performance comparison
Performance: Comparison

Seconds

Particles

performance speedup
Performance: Speedup

Speedup

Particles

ad