Loading in 5 sec....

Accelerating an N-Body SimulationPowerPoint Presentation

Accelerating an N-Body Simulation

- By
**ayame** - Follow User

- 115 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Accelerating an N-Body Simulation' - ayame

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

CPU loads particle data into DRAM for every iteration. (every N*N cycles)

CPU loads particle data into DRAM for every iteration. (every N*N cycles)

A new set of 4 values is read from DRAM in every cycle.

CPU loads particle data into DRAM for every iteration. (every N*N cycles)

A new set of 4 values is read from DRAM in every cycle.

16 force computations are done based on 16 scalar inputs and the 4 values read earlier.

The pipeline and accumulator are described in another slide.

CPU loads particle data into DRAM for every iteration. (every N*N cycles)

A new set of 4 values is read from DRAM in every cycle.

16 force computations are done based on 16 scalar inputs and the 4 values read earlier.

The pipeline and accumulator are described in another slide.

Every pipeline outputs 12 partial sums after ‘N’ cycles.

CPU loads particle data into DRAM for every iteration. (every N*N cycles)

A new set of 4 values is read from DRAM in every cycle.

16 force computations are done based on 16 scalar inputs and the 4 values read earlier.

The pipeline and accumulator are described in another slide.

Every pipeline outputs 12 partial sums after ‘N’ cycles.

CPU adds the 12 partial sums together (for every particle), updates velocities, updates positions and re-writes into the DRAM.

for(int j=0;j<N/PAR;j++)

{

max_set_scalar_input(device,"RowSumKernel.N",N,FPGA_A);//set scalar inputs

max_set_scalar_input_f(device,"RowSumKernel.EPS",EPS,FPGA_A);

for(int p=0;p<PAR;p++)

{

max_set_scalar_input_f(device,pi_x[p],px[j*PAR+p],FPGA_A);

max_set_scalar_input_f(device,pi_y[p],py[j*PAR+p],FPGA_A);

max_set_scalar_input_f(device,pi_z[p],pz[j*PAR+p],FPGA_A);

}

max_run//run the kernel

(

device,

max_output("ax",outputX,12*PAR*sizeof(float)),

max_output("ay",outputY,12*PAR*sizeof(float)),

max_output("az",outputZ,12*PAR*sizeof(float)),

max_runfor("RowSumKernel",N),

max_end()

);

for(int i=0;i<12*PAR;i++) //sum up the partial sums

{

ax[j*PAR+(i/12)]+=outputX[i];

ay[j*PAR+(i/12)]+=outputY[i];

az[j*PAR+(i/12)]+=outputZ[i];

}

}

//update velocity

//update position

//load memory

N/PAR times

N Cycles

Host C code

1 Input per cycle: P_j data from DRAM.

Acceleration:

accumulated as 12 partial sums.

Resource Usage

Resource Usage for 16 fold parallel kernel @ 150MHz:

LUTs: 156032 / 297600 (52.43%)

FFs: 166543 / 595200 (27.98%)

BRAMs: 433 / 1064 (40.70%)

288 / 2016 (14.29%)

Download Presentation

Connecting to Server..