Fast Sparse Matrix-Vector Multiplication on GPUs : Implications for Graph Mining. Xintian Yang , Srinivasan Parthasarathy and P. Sadayappan Department of Computer Science The Ohio State University. Outline. Motivation and Background Methods Experiments Conclusions and Future work.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Xintian Yang, SrinivasanParthasarathy and P. Sadayappan
Department of Computer Science
The Ohio State University
computation of SpMV on such graphs
Texture cache size was not available
Estimated to be 250 KB (=64,000 columns)
Note entire X cannot fit on texture cache
Unstructured matrices: non-power-law
CSR: Imbalanced workload amongst threads, non-
coalesced memory accesses.
CSR-vector: many short rows, waste of threads
Baskaran et al.
COO: thread divergence, low thread level parallelism
ELL: long rows can’t be bounded
HYB: ELL part only covers small amount of computation,
COO part is slow, increasing the ratio of ELL part
introduces memory overhead.