Automated parallelization of non uniform convolutions on chip multiprocessors
This presentation is the property of its rightful owner.
Sponsored Links
1 / 4

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on
  • Presentation posted in: General

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors. Yuanrui Zhang, Mahmut Kandemir Computer Science and Engineering, The Pennsylvania State University. Nikos Pitsianis, Xiaobai Sun Computer Science, Duke University. Non-Uniform FFT Local Convolution. Local Support.

Download Presentation

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors

Yuanrui Zhang, Mahmut Kandemir

Computer Science and Engineering, The Pennsylvania State University.

Nikos Pitsianis, Xiaobai Sun

Computer Science, Duke University.


Non-Uniform FFT Local Convolution

Local Support

sources

targets

  • While FFTs are well implemented by FFTW, we concentrate on accelerating the parallel convolution step on multicores.

  • Geometric tiling can enhance data locality and reuse for the non-uniform local convolution, a matrix-vector product with an irregular and sparse matrix.

  • Multicores have different on-chip memory hierarchies and characteristics.


Architecture Aware Hierarchical Geometric Tiling and Parallel Scheduling

  • Hierarchical tiling according to memory hierarchy and sizes

  • Neighborhood tile traversing order

  • Block distribution

  • Try to take care of data sharing and locality at all levels of cache

1

2

19

18

6

3

16

17

7

5

4

On-chip memory abstraction

8

14

15

13

12

9

10

11


Preliminary Evaluation


  • Login