How Bloom filters work

Andres · September 23, 2019, 4:48pm

Hello,
CAn someone explain me How Bloom filters work?

dave · September 24, 2019, 12:12am

Hi Andreas,

Bloom filters are an interesting ‘probabilistic’ data structure that can be used to see if an item has ‘never been added’ to a set previously. The curious wording here is intentional. It is probabilistic in that it is possible to have false positives but not possible to have false negatives. Bloom filters provide a much more compact and faster way of checking to see if an item exists than storing all items in a set and calling SISMEMBER.

Bloom filters work by running an item through a quick hashing function and sampling bits from that hash and setting them from a 0 to 1 at particular interval in a bitfield. To check for existence in a Bloom filter, the same bits are sampled. Many items may have bits that overlap, but since a hashing function produce unique identifiers, if a single bit from the hash is still a 0, then we know it has NOT been previously added. If it hasn’t been added, then it’s not in the database and you don’t have to go fetch it, possibly saving a lot of time and increasing performance.

For more information, checkout

Dave

ashtul · September 24, 2019, 5:10am

Hello Andres,

Dave gave an excellent technical explanation for the way a Bloom Filter works.

Your incentive to use one is space and speed.

With 1% false positive error rate you would require space of about 8 bits per item (that is one byte!). That would always be much smaller than keeping the complete item and the infrastructure that holds it.
Items not in the Filter will return false on average after checking only two bits. Using a hash table will likely require more memory skips.
Adding an item doesn’t require memory allocation since the full capacity is allocated upon initialization.
Ariel

Fred · July 27, 2020, 1:41pm

To complement the previous explanation, Bloom filters have been used within Redis for years through client-side libraries which took advantage of GETBIT and SETBIT to work with a bit field in a key.

Since ssali Redis 4.0 the RedisBloom module is active, this has helped any implementation overload of the Bloom filter.

within the RedisBloom documentation they give an example of a case for a Bloom filter where they tell you to look for a name of an already used user, on a small scale it is not a problem but when the service grows this can bring more work to the database data.

example:

BF.ADD user names funnyfred
(integer) 1
BF.ADD username fredisfunny
(integer) 1
BF.ADD username fred
(integer) 1
BF.ADD user names funfred
(integer) 1

Now a test is run vs the Bloom filter:

BF.EXISTS username fred
(integer) 1
BF.EXISTS username fred_is_funny
(integer) 0

tgrall · July 27, 2020, 2:15pm

Let me add some other interesting resources about Bloom Filter/Redis Bloom:

How to: Use Bloom Filters in Redis a whiteboard session by @guyroyse
The Redis Day London Bloom filter session available on this page.

aniket · September 17, 2020, 4:51am

Can someone explain me the working Count-Min Sketch in Bloom

ashtul · September 17, 2020, 7:26am

Hello,
A CMS (Count Min Sketch) can give an approximation of the number an element appears in a flow. For example, the count of a specific IP from which requests arrive.
The result given by a CMS is relevant only if it is higher than the noise in the sketch. For example, if a CMS has 100 buckets and 1000 elements in it, any result up to 10 is just noise since on average, each bucket will get 10 hits. If a bucket has 50 results, we can assume a heavy-hitter or an elephant is counted into it. Therefore, in this case, if the return count is 10, you should ignore it, if it is 50, you can use it and assume the rel count is slightly lower.
Note: The sketch uses multiple arrays/sketches in order to reduce the probability of collision between elephants and mice.
Cheers

aniket · September 19, 2020, 5:21am

@ashtul thanks for the help, it been nice to see so much anticipation in this redis community forum .

Topic		Replies	Views
HOW USE REDISBLOOM AND IMPLEMENTING RedisBloom	1	803	November 28, 2019
Redis Bloom Use Cases RedisBloom	2	834	June 9, 2020
Use Cases RedisBloom	2	667	February 12, 2020
What to do in case of "Bloom filter" server failure? RedisBloom	1	354	September 13, 2022
Redis Java client Redisson RedisBloom	3	1143	October 5, 2020

How Bloom filters work

Related Topics