Abstract
Data reduction in storage systems is becoming increasingly important as an effective solution to minimize the management cost of a data center. To maximize data-reduction efficiency, existing post-deduplication delta-compression techniques perform delta compression along with traditional data deduplication and lossless compression. Unfortunately, we observe that existing techniques achieve significantly lower data-reduction ratios than the optimal due to their limited accuracy in identifying similar data blocks. In this paper, we propose DeepSketch, a new reference search technique for post-deduplication delta compression that leverages the learning-to-hash method to achieve higher accuracy in reference search for delta compression, thereby improving data-reduction efficiency. DeepSketch uses a deep neural network to extract a data block's sketch, i.e., to create an approximate data signature of the block that can preserve similarity with other blocks. Our evaluation using eleven real-world workloads shows that DeepSketch improves the data-reduction ratio by up to 33% (21% on average) over a state-of-the-art post-deduplication delta-compression technique.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 20th USENIX Conference on File and Storage Technologies, FAST 2022 |
| Publisher | USENIX Association |
| Pages | 247-263 |
| Number of pages | 17 |
| ISBN (Electronic) | 9781939133267 |
| State | Published - 2022 |
| Event | 20th USENIX Conference on File and Storage Technologies, FAST 2022 - Santa Clara, United States Duration: 22 Feb 2022 → 24 Feb 2022 |
Publication series
| Name | Proceedings of the 20th USENIX Conference on File and Storage Technologies, FAST 2022 |
|---|
Conference
| Conference | 20th USENIX Conference on File and Storage Technologies, FAST 2022 |
|---|---|
| Country/Territory | United States |
| City | Santa Clara |
| Period | 22/02/22 → 24/02/22 |
Bibliographical note
Publisher Copyright:© AST 2022.All rights reserved.