BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash

Guanlin Lu; Young Jin Nam; David H.C. Du

doi:10.1109/MSST.2012.6232390

BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash

Guanlin Lu, Young Jin Nam, David H.C. Du

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

61 Scopus citations

Abstract

Due to its better scalability, Key-Value (KV) store has superseded traditional relational databases for many applications, such as data deduplication, on-line multi-player gaming, and Internet services like Amazon and Facebook. The KV store efficiently supports two operations (key lookup and KV pair insertion) through an index structure that maps keys to their associated values. The KV store is also commonly used to implement the chunk index in data deduplication, where a chunk ID (SHA1 value computed based on the chunk's content) is a key and its associative chunk metadata (e.g., physical storage location, stream ID) is the value. For a deduplication system, typically the number of chunks is too large to store the KV store solely in RAM. Thus, the KV store maintains a large (hash-table based) index structure in RAM to index all KV pairs stored on secondary storage. Hence, its available RAM space limits the maximum number of KV pairs that can be stored. Moving the index data structure from RAM to flash can possibly overcome the space limitation. In this paper, we propose efficient KV store on flash with a Bloom Filter based index structure called BloomStore. The unique features of the BloomStore include (1) no index structure is required to be stored in RAM so that a small RAM space can support a large number of KV pairs and (2) both index structure and KV pairs are stored compactly on flash memory to improve its performance. Compared with the state-of-the-art KV store designs, the BloomStore achieves a significantly better key lookup performance and roughly the same insertion performance with multiple times less RAM usage based on our experiments with deduplication workloads.

Original language	English (US)
Title of host publication	2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012
DOIs	https://doi.org/10.1109/MSST.2012.6232390
State	Published - 2012
Event	2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012 - Pacific Grove, CA, United States Duration: Apr 16 2012 → Apr 20 2012

Publication series

Name	IEEE Symposium on Mass Storage Systems and Technologies
ISSN (Print)	2160-1968

Other

Other	2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012
Country/Territory	United States
City	Pacific Grove, CA
Period	4/16/12 → 4/20/12

Access

10.1109/MSST.2012.6232390

OpenUrl availability

Full text

Cite this

BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash. / Lu, Guanlin; Nam, Young Jin; Du, David H.C.
2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012. 2012. 6232390 (IEEE Symposium on Mass Storage Systems and Technologies).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Lu, G, Nam, YJ & Du, DHC 2012, BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash. in 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012., 6232390, IEEE Symposium on Mass Storage Systems and Technologies, 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012, Pacific Grove, CA, United States, 4/16/12. https://doi.org/10.1109/MSST.2012.6232390

@inproceedings{b7533357e70f4cf88f6343f34f8afaa1,

title = "BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash",

abstract = "Due to its better scalability, Key-Value (KV) store has superseded traditional relational databases for many applications, such as data deduplication, on-line multi-player gaming, and Internet services like Amazon and Facebook. The KV store efficiently supports two operations (key lookup and KV pair insertion) through an index structure that maps keys to their associated values. The KV store is also commonly used to implement the chunk index in data deduplication, where a chunk ID (SHA1 value computed based on the chunk's content) is a key and its associative chunk metadata (e.g., physical storage location, stream ID) is the value. For a deduplication system, typically the number of chunks is too large to store the KV store solely in RAM. Thus, the KV store maintains a large (hash-table based) index structure in RAM to index all KV pairs stored on secondary storage. Hence, its available RAM space limits the maximum number of KV pairs that can be stored. Moving the index data structure from RAM to flash can possibly overcome the space limitation. In this paper, we propose efficient KV store on flash with a Bloom Filter based index structure called BloomStore. The unique features of the BloomStore include (1) no index structure is required to be stored in RAM so that a small RAM space can support a large number of KV pairs and (2) both index structure and KV pairs are stored compactly on flash memory to improve its performance. Compared with the state-of-the-art KV store designs, the BloomStore achieves a significantly better key lookup performance and roughly the same insertion performance with multiple times less RAM usage based on our experiments with deduplication workloads.",

author = "Guanlin Lu and Nam, {Young Jin} and Du, {David H.C.}",

year = "2012",

doi = "10.1109/MSST.2012.6232390",

language = "English (US)",

isbn = "9781467317450",

series = "IEEE Symposium on Mass Storage Systems and Technologies",

booktitle = "2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012",

note = "2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012 ; Conference date: 16-04-2012 Through 20-04-2012",

}

TY - GEN

T1 - BloomStore

T2 - 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012

AU - Lu, Guanlin

AU - Nam, Young Jin

AU - Du, David H.C.

PY - 2012

Y1 - 2012

N2 - Due to its better scalability, Key-Value (KV) store has superseded traditional relational databases for many applications, such as data deduplication, on-line multi-player gaming, and Internet services like Amazon and Facebook. The KV store efficiently supports two operations (key lookup and KV pair insertion) through an index structure that maps keys to their associated values. The KV store is also commonly used to implement the chunk index in data deduplication, where a chunk ID (SHA1 value computed based on the chunk's content) is a key and its associative chunk metadata (e.g., physical storage location, stream ID) is the value. For a deduplication system, typically the number of chunks is too large to store the KV store solely in RAM. Thus, the KV store maintains a large (hash-table based) index structure in RAM to index all KV pairs stored on secondary storage. Hence, its available RAM space limits the maximum number of KV pairs that can be stored. Moving the index data structure from RAM to flash can possibly overcome the space limitation. In this paper, we propose efficient KV store on flash with a Bloom Filter based index structure called BloomStore. The unique features of the BloomStore include (1) no index structure is required to be stored in RAM so that a small RAM space can support a large number of KV pairs and (2) both index structure and KV pairs are stored compactly on flash memory to improve its performance. Compared with the state-of-the-art KV store designs, the BloomStore achieves a significantly better key lookup performance and roughly the same insertion performance with multiple times less RAM usage based on our experiments with deduplication workloads.

AB - Due to its better scalability, Key-Value (KV) store has superseded traditional relational databases for many applications, such as data deduplication, on-line multi-player gaming, and Internet services like Amazon and Facebook. The KV store efficiently supports two operations (key lookup and KV pair insertion) through an index structure that maps keys to their associated values. The KV store is also commonly used to implement the chunk index in data deduplication, where a chunk ID (SHA1 value computed based on the chunk's content) is a key and its associative chunk metadata (e.g., physical storage location, stream ID) is the value. For a deduplication system, typically the number of chunks is too large to store the KV store solely in RAM. Thus, the KV store maintains a large (hash-table based) index structure in RAM to index all KV pairs stored on secondary storage. Hence, its available RAM space limits the maximum number of KV pairs that can be stored. Moving the index data structure from RAM to flash can possibly overcome the space limitation. In this paper, we propose efficient KV store on flash with a Bloom Filter based index structure called BloomStore. The unique features of the BloomStore include (1) no index structure is required to be stored in RAM so that a small RAM space can support a large number of KV pairs and (2) both index structure and KV pairs are stored compactly on flash memory to improve its performance. Compared with the state-of-the-art KV store designs, the BloomStore achieves a significantly better key lookup performance and roughly the same insertion performance with multiple times less RAM usage based on our experiments with deduplication workloads.

UR - http://www.scopus.com/inward/record.url?scp=84866181191&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866181191&partnerID=8YFLogxK

U2 - 10.1109/MSST.2012.6232390

DO - 10.1109/MSST.2012.6232390

M3 - Conference contribution

AN - SCOPUS:84866181191

SN - 9781467317450

T3 - IEEE Symposium on Mass Storage Systems and Technologies

BT - 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies, MSST 2012

Y2 - 16 April 2012 through 20 April 2012

ER -

BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash

Abstract

Publication series

Other

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this