Facebook Instagram Twitter Vimeo Youtube
Sign in
  • Home
  • About
  • Team
  • Buy now!
Sign in
Welcome!Log into your account
Forgot your password?
Privacy Policy
Password recovery
Recover your password
Search
Logo
Sign in
Welcome! Log into your account
Forgot your password? Get help
Privacy Policy
Password recovery
Recover your password
A password will be e-mailed to you.
Thursday, September 4, 2025
Sign in / Join
  • Contact Us
  • Our Team
Facebook
Instagram
Twitter
Vimeo
Youtube
Logo
  • Home
  • News
    • News

      Anthropic Confirms Claude AI Was Weaponized in Major Cyberattacks by Husain Parvez

      3 September 2025
      News

      Over 30,000 Malicious IPs Target Microsoft Remote Desktop in Global Surge by Husain Parvez

      31 August 2025
      News

      Cyber Threat-Sharing Law Nears Expiration: Experts Warn of Risks by Husain Parvez

      31 August 2025
      News

      North Korean Hacking Tools Leak Online, Including Advanced Linux Rootkit by Paige Henley

      28 August 2025
      News

      iiNet Cyberattack Exposes Data of 280,000 Customers by Husain Parvez

      28 August 2025
  • Data Modelling & AI
    • AllBig dataBusiness AnalyticsData ScienceData Structure & AlgorithmDatabasesVector DatabaseDeep LearningEthical HackingGenerative AIMachine Learning
      Big data

      LangExtract + Milvus: A Practical Guide to Building a Hybrid Document Processing and Search System

      30 August 2025
      Big data

      Stop Your AI Assistant from Writing Outdated Code with Milvus SDK Code Helper

      26 August 2025
      Big data

      A Practical Guide for Choosing the Right Vector Database for Your AI Applications

      26 August 2025
      Big data

      Why I’m Against Claude Code’s Grep-Only Retrieval? It Just Burns Too Many Tokens

      26 August 2025
    • Big data
    • Business Analytics
    • Databases
    • Data Structure & Algorithm
    • Data Science
    • Deep Learning
    • Ethical Hacking
    • Generative AI
    • Machine Learning
    • Security & Testing
  • Mobile
    • AllAndroidIOS
      Android

      The Samsung Health app now puts a licensed doctor right in your pocket

      3 September 2025
      Android

      Google’s NotebookLM is giving Audio Overviews new personalities

      3 September 2025
      Android

      MediaTek’s next flagship chip may give future Android phones faster cores and a beefed-up NPU

      3 September 2025
      Android

      Google Maps navigation on Pixel and Wear OS watches just got a lot easier

      3 September 2025
    • Android
    • IOS
  • Languages
    • AllAjaxAngularDynamic ProgrammingGolangJavaJavascriptPhpPythonReactVue
      Languages

      Working with Titles and Heading – Python docx Module

      25 June 2025
      Languages

      Creating a Receipt Calculator using Python

      25 June 2025
      Languages

      One Liner for Python if-elif-else Statements

      25 June 2025
      Languages

      Add Years to datetime Object in Python

      25 June 2025
    • Java
    • Python
  • Guest Blogs
  • Discussion
  • Our Team
HomeData Modelling & AIBig dataWhat are Bloom Filters and Where are they Used?
Big dataGuest Blogs

What are Bloom Filters and Where are they Used?

Algomaster
By Algomaster
15 June 2025
0
0
Share
Facebook
Twitter
Pinterest
WhatsApp

    What are Bloom Filters and Where are they Used?

    Explained with Real-World Examples

    Ashish Pratap Singh's avatar

    Ashish Pratap Singh
    Nov 24, 2024

    Have you ever wondered how Netflix knows which shows you’ve already watched? Or how Amazon avoids showing you products you’ve already purchased?

    Using a traditional data structure like a hash table for these checks could consume significant amount of memory, especially with millions of users and items.

    Instead, many systems rely on a more efficient data structure—a Bloom Filter.

    In this article, we will learn what a bloom filter is, how it works, how to implement it in code, it’s real-world applications and limitations.


    If you’re finding this newsletter valuable and want to deepen your learning, consider becoming a paid subscriber.

    As a paid subscriber, you’ll receive an exclusive deep-dive article every week, access to a structured System Design Resource (100+ topics and interview questions), and other premium perks.

    Unlock Full Access


    🤔 What is a Bloom Filter?

    A Bloom Filter is a probabilistic data structure that allows you to quickly check whether an element might be in a set.

    It’s useful in scenarios where you need fast lookups and don’t want to use a large amount of memory, but you’re okay with occasional false positives.

    🧩 Key Components of a Bloom Filter:

    1. Bit Array: The Bloom Filter consists of a bit array of a fixed size, initialized to all zeros. This array represents whether certain elements are in the set.

    2. Hash Functions: To add or check an element, a Bloom Filter uses multiple hash functions. Each hash function maps an element to an index in the bit array.

    ⚙️ How Does a Bloom Filter Work?

    A Bloom filter works by using multiple hash functions to map each element in the set to a bit array.

    1. Initialization:

    • A Bloom filter starts with an empty bit array of size m (all bits are initially set to 0).

    • It also requires k independent hash functions, each of which maps an element to one of the m positions in the bit array.

    2. Inserting an Element:

    • To insert an element into the Bloom filter, you pass it through each of the k hash functions to get k positions in the bit array.

    • The bits at these positions are set to 1.

    3. Checking for Membership:

    • To check if an element is in the set, you again pass it through the k hash functions to get k positions.

    • If all the bits at these positions are set to 1, the element is considered to be in the set (though there’s a chance it might be a false positive).

    • If any bit at these positions is 0, the element is definitely not in the set.

    Share

    🔎 Example: Using a Bloom Filter for URL Checking

    Imagine you’re building a web crawler that needs to keep track of which URLs it has already visited.

    Instead of storing every URL (which would require a lot of memory), you decide to use a Bloom Filter.

    Step 1: Set Up the Bloom Filter

    • Initialize a Bit Array: Let’s assume our Bloom Filter uses a bit array of size 10, initially all set to 0.

    • Choose Hash Functions: We’ll use two hash functions in this example. These hash functions take an input (like a URL) and output an index in the bit array.

    Step 2: Adding a URL to the Bloom Filter

    Suppose we want to add the URL example.com to our Bloom Filter.

    1. Hash Function 1 generates an index of 3 for example.com.

    2. Hash Function 2 generates an index of 7 for example.com.

    We set the bits at indices 3 and 7 in the bit array to 1.

    Step 3: Adding Another URL

    Now, let’s add another URL, algomaster.io.

    1. Hash Function 1 generates an index of 1 for algomaster.io.

    2. Hash Function 2 generates an index of 4 for algomaster.io.

    We set the bits at indices 1 and 4 in the bit array to 1.

    Step 4: Checking for a URL in the Bloom Filter

    Suppose we want to check if example.com is already in the Bloom Filter.

    1. Hash Function 1 generates index 3 for example.com.

    2. Hash Function 2 generates index 7 for example.com.

    Since both bits at indices 3 and 7 are set to 1, we can say that example.com is probably in the set (there’s a small chance of a false positive).

    Step 5: Checking for a Non-Existent URL

    Now, let’s check if nonexistent.com is in the Bloom Filter.

    1. Hash Function 1 generates index 2 for nonexistent.com.

    2. Hash Function 2 generates index 5 for nonexistent.com.

    Since the bits at indices 2 and 5 are both 0, we can confidently say that nonexistent.com is not in the set.

    💻 Code Implementation (Java)

    Explanation

    1. BitSet: Java’s BitSet is used for the bit array to efficiently store and manipulate bits.

    2. Hash Functions: The code uses two simple hash functions. You can add more complex ones for better distribution.

    3. add(String item): This method takes an item, applies each hash function, and sets the corresponding bit in the bit array.

    4. mightContain(String item): This method checks if an item might be in the set by testing if all corresponding bits are 1.

      • If any bit is 0, the item is definitely not in the set.

      • If all bits are 1, the item is probably in the set (with a small chance of a false positive).

    🌎 Real-World Applications of Bloom Filters

    Bloom Filters are widely used in real-world applications where space efficiency and speed are essential, and occasional false positives are acceptable.

    Here are some common scenarios where Bloom Filters are employed:

    1. Web Caching

    Problem: Web servers often cache frequently accessed pages or resources to improve response times. However, checking the cache for every resource could become costly and slow as the cache grows.

    Solution: A Bloom Filter can be used to quickly check if a URL might be in the cache. When a request arrives, the Bloom Filter is checked first. If the Bloom Filter indicates the URL is “probably in the cache,” a cache lookup is performed.

    If it indicates the URL is “definitely not in the cache,” the server skips the cache lookup and fetches the resource from the primary storage, saving time and resources.

    2. Spam Filtering in Email Systems

    Problem: Email systems need to filter out spam emails without constantly checking large spam databases.

    Solution: A Bloom Filter can store hashes of known spam email addresses. When a new email arrives, the Bloom Filter checks if the sender’s address might be in the spam list.

    This allows the email system to quickly determine whether an email is likely to be spam or legitimate.

    3. Databases

    Problem: Databases, especially distributed ones, often need to check if a key exists before accessing or modifying data. Performing these checks for every key directly in the database can be slow.

    Solution: Many databases, such as Cassandra, HBase, and Redis, use Bloom Filters to avoid unnecessary disk lookups for non-existent keys. The Bloom Filter quickly checks if a key might be present. If the Bloom Filter indicates “not present,” it can skip the database lookup.

    4. Content Recommendation Systems

    Problem: Recommendation systems, such as those used by streaming services, need to avoid recommending content that users have already consumed.

    Solution: A Bloom Filter can track the content each user has previously watched or interacted with. When generating new recommendations, the Bloom Filter quickly checks if an item might already have been consumed.

    5. Social Network Friend Recommendations

    Problem: Social networks like Facebook or LinkedIn recommend friends or connections to users, but they need to avoid recommending people who are already friends.

    Solution: A Bloom Filter is used to store the list of each user’s existing connections. Before suggesting new friends, the Bloom Filter can be checked to ensure the user isn’t already connected with them.

    🛑 Limitations of Bloom Filters

    1. False Positives

    Bloom Filters can produce false positives, meaning they may incorrectly indicate that an element is present in the set when it is not.

    Example: Consider a scenario where a non-existent key is checked against a Bloom Filter. If all the hash functions map to bits that are already set to 1, the filter falsely signals the presence of the key.

    Such false positives can lead to unnecessary processing or incorrect assumptions about data.

    For instance, in a database system, this might trigger unnecessary cache lookups or wasted attempts to fetch data that doesn’t actually exist.

    The likelihood of false positives can be reduced by choosing an optimal size for the bit array and an appropriate number of hash functions, but they can never be completely eliminated.

    2. No Support for Deletions

    Standard Bloom Filters do not support element deletions. Once a bit is set to 1 by adding an element, it cannot be unset because other elements may also rely on that bit.

    This limitation makes Bloom Filters unsuitable for dynamic sets where elements are frequently added and removed.

    Variants like the Counting Bloom Filter can allow deletions by using counters instead of bits, but this requires more memory.

    3. Limited to Set Membership Queries

    Bloom Filters are specifically designed to answer set membership queries. They do not provide information about the actual elements in the set, nor do they support complex queries or operations beyond basic membership checks.

    Example: If you need to know the details of an element (e.g., full information about a user ID), you would need another data structure in addition to the Bloom Filter.

    4. Not Suitable for Exact Set Membership

    Bloom Filters are probabilistic, meaning they cannot provide a definite “yes” answer (only a “probably yes” or “definitely no”).

    For applications requiring exact membership information, Bloom Filters are not suitable. Other data structures like hash tables or balanced trees should be used instead.

    5. Vulnerable to Hash Collisions

    Hash collisions are more likely as the number of elements in the Bloom Filter grows. Multiple elements can end up setting or relying on the same bits, increasing false positives.

    As hash collisions accumulate, the filter’s effectiveness decreases. With a high load factor, the filter may perform poorly and become unreliable.

    The use of additional hash functions can help reduce collisions, but increasing the number of hash functions also increases the complexity and the memory requirements.

    ✅ Conclusion

    To summarize, bloom filters are a powerful tool for space-efficient set membership testing, with a wide range of applications. While they may not be suitable for all applications due to the possibility of false positives, they shine in scenarios where space is at a premium and a small error rate is acceptable.


    Thank you for reading!

    If you found it valuable, hit a like ❤️ and consider subscribing for more such content every week.

    If you have any questions or suggestions, leave a comment.

    This post is public so feel free to share it.

    Share


    P.S. If you’re finding this newsletter helpful and want to get even more value, consider becoming a paid subscriber.

    As a paid subscriber, you’ll receive an exclusive deep dive every week, access to a comprehensive system design learning resource , and other premium perks.

    Get full access to AlgoMaster

    There are group discounts, gift options, and referral bonuses available.


    Checkout my Youtube channel for more in-depth content.

    Follow me on LinkedIn, X and Medium to stay updated.

    Checkout my GitHub repositories for free interview preparation resources.

    I hope you have a lovely day!

    See you soon,
    Ashish

    Share
    Facebook
    Twitter
    Pinterest
    WhatsApp
      Previous article
      CAP Theorem Explained
      Next article
      What is Idempotency in Distributed Systems?
      Algomaster
      Algomasterhttps://blog.algomaster.io
      RELATED ARTICLES
      Guest Blogs

      7 Best 123Movies Alternatives in 2025: Free & Safe Sites by Ivan Stevanovic

      3 September 2025
      Guest Blogs

      Interview with Tyson Garrett – CTO of TrustOnCloud – Making Cloud Threat Modeling Executable by Shauli Zacks

      2 September 2025
      Big data

      LangExtract + Milvus: A Practical Guide to Building a Hybrid Document Processing and Search System

      30 August 2025

      LEAVE A REPLY Cancel reply

      Log in to leave a comment

      Most Popular

      The Samsung Health app now puts a licensed doctor right in your pocket

      3 September 2025

      Google’s NotebookLM is giving Audio Overviews new personalities

      3 September 2025

      MediaTek’s next flagship chip may give future Android phones faster cores and a beefed-up NPU

      3 September 2025

      Google Maps navigation on Pixel and Wear OS watches just got a lot easier

      3 September 2025
      Load more
      Algomaster
      Algomaster
      202 POSTS0 COMMENTS
      https://blog.algomaster.io
      Calisto Chipfumbu
      Calisto Chipfumbu
      6637 POSTS0 COMMENTS
      http://cchipfumbu@gmail.com
      Dominic
      Dominic
      32260 POSTS0 COMMENTS
      http://wardslaus.com
      Milvus
      Milvus
      81 POSTS0 COMMENTS
      https://milvus.io/
      Nango Kala
      Nango Kala
      6625 POSTS0 COMMENTS
      neverop
      neverop
      0 POSTS0 COMMENTS
      https://geeksforgeeks.org
      Nicole Veronica
      Nicole Veronica
      11795 POSTS0 COMMENTS
      Nokonwaba Nkukhwana
      Nokonwaba Nkukhwana
      11855 POSTS0 COMMENTS
      Safety Detectives
      Safety Detectives
      2594 POSTS0 COMMENTS
      https://www.safetydetectives.com/
      Shaida Kate Naidoo
      Shaida Kate Naidoo
      6746 POSTS0 COMMENTS
      Ted Musemwa
      Ted Musemwa
      7023 POSTS0 COMMENTS
      Thapelo Manthata
      Thapelo Manthata
      6694 POSTS0 COMMENTS
      Umr Jansen
      Umr Jansen
      6714 POSTS0 COMMENTS

      EDITOR PICKS

      The Samsung Health app now puts a licensed doctor right in your pocket

      3 September 2025

      Google’s NotebookLM is giving Audio Overviews new personalities

      3 September 2025

      MediaTek’s next flagship chip may give future Android phones faster cores and a beefed-up NPU

      3 September 2025

      POPULAR POSTS

      The Samsung Health app now puts a licensed doctor right in your pocket

      3 September 2025

      Google’s NotebookLM is giving Audio Overviews new personalities

      3 September 2025

      MediaTek’s next flagship chip may give future Android phones faster cores and a beefed-up NPU

      3 September 2025

      POPULAR CATEGORY

      • Languages45985
      • Data Modelling & AI17566
      • Java15156
      • Android14048
      • Mobile12983
      • Javascript12713
      • Guest Blogs12669
      • Data Structure & Algorithm10077
      Logo

      ABOUT US

      We provide you with the latest breaking news and videos straight from the technology industry.

      Contact us: hello@geeksforgeeks.org

      FOLLOW US

      Blogger
      Facebook
      Flickr
      Instagram
      VKontakte

      © NeverOpen 2022

      • Home
      • News
      • Data Modelling & AI
      • Mobile
      • Languages
      • Guest Blogs
      • Discussion
      • Our Team