Abstract
The rising popularity of intelligent mobile devices and the computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a novel model compression scheme that allows inference to be carried out using bit-level sparsity, which can be efficiently implemented using in-memory computing macros. In this paper, we introduce a method called BitS-Net to leverage the benefits of bit-sparsity (where the number of zeros are more than number of ones in binary representation of weight/activation values) when applied to compute-in-memory (CIM) with resistive RAM (RRAM) to develop energy efficient DNN accelerators operating in the inference mode. We demonstrate that BitS-Net improves the energy efficiency by up to 5x for ResNet models on the ImageNet dataset.
| Original language | English |
|---|---|
| Pages (from-to) | 1952-1961 |
| Number of pages | 10 |
| Journal | IEEE Transactions on Circuits and Systems I: Regular Papers |
| Volume | 69 |
| Issue number | 5 |
| DOIs | |
| State | Published - 1 May 2022 |
Bibliographical note
Publisher Copyright:© 2004-2012 IEEE.
Keywords
- DNN accelerator
- Deep neural network
- in memory computing
- quantization