When to use bloom filters & cuckoo filters

Hi everyone,

Can you please share case study, using which you decided to opt for filters?

I would like to share an interesting fact about Cuckoo filters.

What’s in a name: "Cuckoo"
Like Bloom filters, the Cuckoo filter is a probabilistic data structure for testing set membership. The ‘Cuckoo’ in the name comes from the filter’s use of the Cuckoo hashtable as its underlying storage structure. The Cuckoo hashtable is named after the cuckoo bird becauses it leverages the brood parasitic behavior of the bird in its design. Cuckoo birds are known to lay eggs in the nests of other birds, and once an egg hatches, the young bird typically ejects the host’s eggs from the nest. A Cuckoo hash table employs similar behavior in dealing with items to be inserted into occupied 'buckets’ in a Cuckoo hash table. We explain this behavior in the section on Cuckoo filter. Now, we’ll provide a brief overview of a Bloom filter before exploring Cuckoo filters.

check this blog for more details https://blog.fastforwardlabs.com/2016/11/23/probabilistic-data-structure-showdown-cuckoo.html

Hello Manjeet,

A classic use of a Bloom Filter is for a news website to keep track on which IPs have consumed their one free article for the day. A Bloom Filter is initialized every 24 hours and each incoming IP is being tested against it. If it does not exist, the IP is granted a free article and the IP is added to the filter, otherwise, the request is rejected and the article is being trimmed or gets extra ads.

An interesting use case I have encountered for Cuckoo filter is, keeping a small filter for each cellphone IMEI. The filter is used to store all apps that have been installed on the phone. This solution is extremely lean as it uses an expanding Cuckoo filter. The initial filter takes almost no space and stays small for most users who don’t install many apps. Filters of heavy users, with many installations, expand to accommodate a larger number of items.

Hope this helps,