Abstract
The advent of the AI era has made it increasingly important to have an efficient backup system to protect training data from loss. Furthermore, a backup of the training data makes it possible to update or retrain the learned model as more data are collected. However, a huge backup overhead will result if a complete copy of all daily collected training data is always made to backup storage, especially because the data typically contain highly redundant information that makes no contribution to model learning. Deduplication is a common technique in modern backup systems to reduce data redundancy. However, existing deduplication methods are invalid for training data. Hence, this paper proposes a novel deduplication strategy for the training data used for learning in a deep neural network classifier. Experimental results showed that the proposed deduplication strategy achieved 93% backup storage space reduction with only 1.3% loss of classification accuracy.
Original language | English (US) |
---|---|
Title of host publication | 2020 IEEE 39th International Performance Computing and Communications Conference, IPCCC 2020 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781728198293 |
DOIs | |
State | Published - Nov 6 2020 |
Event | 39th IEEE International Performance Computing and Communications Conference, IPCCC 2020 - Austin, United States Duration: Nov 6 2020 → Nov 8 2020 |
Publication series
Name | 2020 IEEE 39th International Performance Computing and Communications Conference, IPCCC 2020 |
---|
Conference
Conference | 39th IEEE International Performance Computing and Communications Conference, IPCCC 2020 |
---|---|
Country/Territory | United States |
City | Austin |
Period | 11/6/20 → 11/8/20 |
Bibliographical note
Funding Information:This work was supported in part by the Center for Research in Intelligent Storage (CRIS), which is supported by National Science Foundation grant no. IIP-1439622 and member companies. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.
Publisher Copyright:
© 2020 IEEE.
Keywords
- Backup systems
- Deduplication
- Deep learning
- Training data