Automated parallelization of non uniform convolutions on chip multiprocessors
This presentation is the property of its rightful owner.
Sponsored Links
1 / 4

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors. Yuanrui Zhang, Mahmut Kandemir Computer Science and Engineering, The Pennsylvania State University. Nikos Pitsianis, Xiaobai Sun Computer Science, Duke University. Non-Uniform FFT Local Convolution. Local Support.

Download Presentation

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Automated parallelization of non uniform convolutions on chip multiprocessors

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors

Yuanrui Zhang, Mahmut Kandemir

Computer Science and Engineering, The Pennsylvania State University.

Nikos Pitsianis, Xiaobai Sun

Computer Science, Duke University.


Non uniform fft local convolution

Non-Uniform FFT Local Convolution

Local Support

sources

targets

  • While FFTs are well implemented by FFTW, we concentrate on accelerating the parallel convolution step on multicores.

  • Geometric tiling can enhance data locality and reuse for the non-uniform local convolution, a matrix-vector product with an irregular and sparse matrix.

  • Multicores have different on-chip memory hierarchies and characteristics.


Architecture aware hierarchical geometric tiling and parallel scheduling

Architecture Aware Hierarchical Geometric Tiling and Parallel Scheduling

  • Hierarchical tiling according to memory hierarchy and sizes

  • Neighborhood tile traversing order

  • Block distribution

  • Try to take care of data sharing and locality at all levels of cache

1

2

19

18

6

3

16

17

7

5

4

On-chip memory abstraction

8

14

15

13

12

9

10

11


Preliminary evaluation

Preliminary Evaluation


  • Login