DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression

Jisung Park, Jeonggyun Kim, Yeseong Kim, Sungjin Lee, Onur Mutlu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

Data reduction in storage systems is becoming increasingly important as an effective solution to minimize the management cost of a data center. To maximize data-reduction efficiency, existing post-deduplication delta-compression techniques perform delta compression along with traditional data deduplication and lossless compression. Unfortunately, we observe that existing techniques achieve significantly lower data-reduction ratios than the optimal due to their limited accuracy in identifying similar data blocks. In this paper, we propose DeepSketch, a new reference search technique for post-deduplication delta compression that leverages the learning-to-hash method to achieve higher accuracy in reference search for delta compression, thereby improving data-reduction efficiency. DeepSketch uses a deep neural network to extract a data block's sketch, i.e., to create an approximate data signature of the block that can preserve similarity with other blocks. Our evaluation using eleven real-world workloads shows that DeepSketch improves the data-reduction ratio by up to 33% (21% on average) over a state-of-the-art post-deduplication delta-compression technique.

Original languageEnglish
Title of host publicationProceedings of the 20th USENIX Conference on File and Storage Technologies, FAST 2022
PublisherUSENIX Association
Pages247-263
Number of pages17
ISBN (Electronic)9781939133267
StatePublished - 2022
Event20th USENIX Conference on File and Storage Technologies, FAST 2022 - Santa Clara, United States
Duration: 22 Feb 202224 Feb 2022

Publication series

NameProceedings of the 20th USENIX Conference on File and Storage Technologies, FAST 2022

Conference

Conference20th USENIX Conference on File and Storage Technologies, FAST 2022
Country/TerritoryUnited States
CitySanta Clara
Period22/02/2224/02/22

Bibliographical note

Publisher Copyright:
© AST 2022.All rights reserved.

Fingerprint

Dive into the research topics of 'DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression'. Together they form a unique fingerprint.

Cite this