BloomFlash: Bloom filter on flash-based storage

Biplob Debnath, Sudipta Sengupta, Jin Li, David J. Lilja, David H.C. Du

Research output: Chapter in Book/Report/Conference proceedingConference contribution

40 Scopus citations

Abstract

The bloom filter is a probabilistic data structure that provides a compact representation of a set of elements. To keep false positive probabilities low, the size of the bloom filter must be dimensioned a priori to be linear in the maximum number of keys inserted, with the linearity constant ranging typically from one to few bytes. A bloom filter is most commonly used as an inmemory data structure, hence its size is limited by the availability of RAM space on the machine. As datasets have grown over time to Internet scale, so have the RAM space requirements of bloom filters. If sufficient RAM space is not available, we advocate that flash memory may serve as a suitable medium for storing bloom filters, since it is about one-tenth the cost of RAM per GB while still providing access times orders of magnitude faster than hard disk. We present BLOOMFLASH, a bloom filter designed for flash memory based storage, that provides a new dimension of tradeoff with bloom filter access times to reduce RAM space usage (and hence system cost). The simple design of a single flat bloom filter on flash suffers from many performance bottlenecks, including in-place bit updates that are inefficient on flash and multiple reads and random writes spread out across many flash pages for a single lookup or insert operation. To mitigate these performance bottlenecks, BLOOMFLASH leverages two key design innovations: (i) buffering bit updates in RAM and applying them in bulk to flash that helps to reduce random writes to flash, and (ii) a hierarchical bloom filter design consisting of component bloom filters, stored one per flash page, that helps to localize reads and writes on flash. We use two real-world data traces taken from representative bloom filter applications to drive and evaluate our design. BLOOMFLASH achieves bloom filter access times in the range of few tens of sec, thus allowing up to order of tens of thousands operations per sec.

Original languageEnglish (US)
Title of host publicationProceedings - 31st International Conference on Distributed Computing Systems, ICDCS 2011
Pages635-644
Number of pages10
DOIs
StatePublished - 2011
Event31st International Conference on Distributed Computing Systems, ICDCS 2011 - Minneapolis, MN, United States
Duration: Jun 20 2011Jul 24 2011

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Other

Other31st International Conference on Distributed Computing Systems, ICDCS 2011
Country/TerritoryUnited States
CityMinneapolis, MN
Period6/20/117/24/11

Fingerprint

Dive into the research topics of 'BloomFlash: Bloom filter on flash-based storage'. Together they form a unique fingerprint.

Cite this