Abstract
We discover that a trainable convolution layer with a stride over 1 and kernel ≥ stride is identical to a trainable block transform. A block transform is performed when we use a convolution layer with a stride ≥ 2 and a kernel ≥ the stride. For instance, if we use the same widths, such as a 2 × 2 convolution kernel and stride-2, there are no overlaps between sliding windows, so this layer operates a block transform on the partitioned 2 × 2 blocks. A block transform reduces the computational complexity due to a stride ≥ 2. To keep the original size, we apply a transposed convolution (stride = kernel ≥ 2), an adjoint operator of a forward block transform. Based on this relationship, we propose a trainable multi-scale block transform for autoencoders. The proposed method has an encoder consisting of two sequential convolutions with stride-2, a 2× 2 kernel, and a decoder consisting of the encoder's two adjoint operators (transposed convolution). Clipping is used for nonlinear activations. Inspired by the zero-frequency element in the dictionary learning method, the proposed method uses DC values for residual learning. The proposed method shows high-resolution representations, whereas the stride-1 convolutional autoencoder with 3 × 3 kernels generates blurry images.
Original language | English |
---|---|
Article number | 9436009 |
Pages (from-to) | 1016-1019 |
Number of pages | 4 |
Journal | IEEE Signal Processing Letters |
Volume | 28 |
DOIs | |
State | Published - 2021 |
Bibliographical note
Publisher Copyright:© 1994-2012 IEEE.
Keywords
- Block transform
- autoencoder
- convolutional neural network
- image representation