InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing

Daehyeon Baek, Soojin Hwang, Taekyung Heo, Daehoon Kim, Jaehyuk Huh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

28 Scopus citations

Abstract

Sparse matrix multiplication is one of the key computational kernels in large-scale data analytics. However, a naive implementation suffers from the overheads of irregular memory accesses due to the representation of sparsity. To mitigate the memory access overheads, recent accelerator designs advocated the outer product processing which minimizes input accesses but generates intermediate products to be merged to the final output matrix. Using real-world sparse matrices, this study first identifies the memory bloating problem of the outer product designs due to the unpredictable intermediate products. Such an unpredictable increase in memory requirement during computation can limit the applicability of accelerators. To address the memory bloating problem, this study revisits an alternative inner product approach, and proposes a new accelerator design called InnerSP. This study shows that nonzero element distributions in real-world sparse matrices have a certain level of locality. Using a smart caching scheme designed for inner product, the locality is effectively exploited with a modest on-chip cache. However, the row-wise inner product relies on on-chip aggregation of intermediate products. Due to uneven sparsity per row, overflows or underflows of the on-chip storage for aggregation can occur. To maximize the parallelism while avoiding costly overflows, the proposed accelerator uses pre-scanning for row splitting and merging. The simulation results show that the performance of InnerSP can exceed or be similar to those of the prior outer product approaches without any memory bloating problem.

Original languageEnglish
Title of host publicationProceedings - 30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021
EditorsJaejin Lee, Albert Cohen
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages116-128
Number of pages13
ISBN (Electronic)9781665442787
DOIs
StatePublished - 2021
Event30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021 - Virtual, Onliine, United States
Duration: 26 Sep 202129 Sep 2021

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
Volume2021-September
ISSN (Print)1089-795X

Conference

Conference30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021
Country/TerritoryUnited States
CityVirtual, Onliine
Period26/09/2129/09/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE

Keywords

  • Hardware accelerator
  • Inner product
  • Sparse matrix multiplication

Fingerprint

Dive into the research topics of 'InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing'. Together they form a unique fingerprint.

Cite this