How Bloom filters work

Hello,
CAn someone explain me How Bloom filters work?

Hi Andreas,

Bloom filters are an interesting ‘probabilistic’ data structure that can be used to see if an item has ‘never been added’ to a set previously. The curious wording here is intentional. It is probabilistic in that it is possible to have false positives but not possible to have false negatives. Bloom filters provide a much more compact and faster way of checking to see if an item exists than storing all items in a set and calling SISMEMBER.

Bloom filters work by running an item through a quick hashing function and sampling bits from that hash and setting them from a 0 to 1 at particular interval in a bitfield. To check for existence in a Bloom filter, the same bits are sampled. Many items may have bits that overlap, but since a hashing function produce unique identifiers, if a single bit from the hash is still a 0, then we know it has NOT been previously added. If it hasn’t been added, then it’s not in the database and you don’t have to go fetch it, possibly saving a lot of time and increasing performance.

For more information, checkout

Dave

Hello Andres,

Dave gave an excellent technical explanation for the way a Bloom Filter works.

Your incentive to use one is space and speed.

  • With 1% false positive error rate you would require space of about 8 bits per item (that is one byte!). That would always be much smaller than keeping the complete item and the infrastructure that holds it.
  • Items not in the Filter will return false on average after checking only two bits. Using a hash table will likely require more memory skips.
  • Adding an item doesn’t require memory allocation since the full capacity is allocated upon initialization.
    Ariel