Characterizing, modeling, and benchmarking RocksDB Key-value workloads at Facebook

Zhichao Cao, Siying Dong, Sagar Vemuri, David H.C. Du

Research output: Chapter in Book/Report/Conference proceedingConference contribution

196 Scopus citations

Abstract

Persistent key-value stores are widely used as building blocks in today's IT infrastructure for managing and storing large amounts of data. However, studies of characterizing real-world workloads for key-value stores are limited due to the lack of tracing/analyzing tools and the difficulty of collecting traces in operational environments. In this paper, we first present a detailed characterization of workloads from three typical RocksDB production use cases at Facebook: UDB (a MySQL storage layer for social graph data), ZippyDB (a distributed key-value store), and UP2X (a distributed key-value store for AI/ML services). These characterizations reveal several interesting findings: first, that the distribution of key and value sizes are highly related to the use cases/applications; second, that the accesses to key-value pairs have a good locality and follow certain special patterns; and third, that the collected performance metrics show a strong diurnal pattern in the UDB, but not the other two. We further discover that although the widely used key-value benchmark YCSB provides various workload configurations and key-value pair access distribution models, the YCSBtriggered workloads for underlying storage systems are still not close enough to the workloads we collected due to ignorance of key-space localities. To address this issue, we propose a key-range based modeling and develop a benchmark that can better emulate the workloads of real-world key-value stores. This benchmark can synthetically generate more precise key-value queries that represent the reads and writes of key-value stores to the underlying storage system.

Original languageEnglish (US)
Title of host publicationProceedings of the 18th USENIX Conference on File and Storage Technologies, FAST 2020
PublisherUSENIX Association
Pages209-223
Number of pages15
ISBN (Electronic)9781939133120
StatePublished - 2020
Event18th USENIX Conference on File and Storage Technologies, FAST 2020 - Santa Clara, United States
Duration: Feb 25 2020Feb 27 2020

Publication series

NameProceedings of the 18th USENIX Conference on File and Storage Technologies, FAST 2020

Conference

Conference18th USENIX Conference on File and Storage Technologies, FAST 2020
Country/TerritoryUnited States
CitySanta Clara
Period2/25/202/27/20

Bibliographical note

Funding Information:
We would like to thank our shepherd, George Amvrosiadis, and the anonymous reviewers for their valuable feedback. We would like to thank Jason Flinn, Shrikanth Shankar, Marla Azriel, Michael Stumm, Fosco Marotto, Nathan Bronson, Mark Callaghan, Mahesh Balakrishnan, Yoshinori Matsunobu, Domas Mituzas, Anirban Rahut, Mikhail Antonov, Joanna Bu-jnowska, Atul Goyal, Tony Savor, Dave Nagle, and many others at Facebook for their comments, suggestions, and support in this research project. We also thank all the RocksDB team members at Facebook. This work was partially supported by the following NSF awards 1439622, 1525617, 1536447, and 1812537, granted to authors Cao and Du in their academic roles at the University of Minnesota, Twin Cities.

Publisher Copyright:
Copyright © Proc. of the 18th USENIX Conference on File and Storage Tech., FAST 2020. All rights reserved.

Fingerprint

Dive into the research topics of 'Characterizing, modeling, and benchmarking RocksDB Key-value workloads at Facebook'. Together they form a unique fingerprint.

Cite this