Abstract
The rising popularity of intelligent mobile devices and the computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a novel model compression scheme that allows inference to be carried out using bit-level sparsity, which can be efficiently implemented using in-memory computing macros. In this paper, we introduce a method called BitS-Net to leverage the benefits of bit-sparsity (where the number of zeros are more than number of ones in binary representation of weight/activation values) when applied to compute-in-memory (CIM) with resistive RAM (RRAM) to develop energy efficient DNN accelerators operating in the inference mode. We demonstrate that BitS-Net improves the energy efficiency by up to 5x for ResNet models on the ImageNet dataset.
Original language | English |
---|---|
Pages (from-to) | 1952-1961 |
Number of pages | 10 |
Journal | IEEE Transactions on Circuits and Systems I: Regular Papers |
Volume | 69 |
Issue number | 5 |
DOIs | |
State | Published - 1 May 2022 |
Bibliographical note
Publisher Copyright:© 2004-2012 IEEE.
Keywords
- DNN accelerator
- Deep neural network
- in memory computing
- quantization