Here are some notes after listening to a coursera lecture of Algorithm 1 by Professor Roughgarden at Stanford.
The goal of a Bloom filter is to provide a space efficient hash for some data entries. The insertion and validation of an entry are very efficient. But a trade-off is that deletion is not possible and it is possible to have a small but finite rate of false positive during validation.
The main idea of a Bloom filter is very simple. Its core is composed of a number (let say $latex k$) of hash functions and an array of $latex n$ binary numbers $latex A$. The hash functions will map any data entry to a number from $latex 1$ to $latex n$. Given an entry $latex d$, a bloom filter simply set the $latex h_i(d)$-bit of $latex A$ as one during insertion ($latex i=1,\cdots,k$). During validation, a bloom filter will decide an input entry $latex e$ has been inserted before if $latex h_i(e)=1$ for $latex i=1,\cdots,k$.
One can easily see that it is impossible to have false negative but it can have false positive. Given an input $latex e$ during validation, without loss of generality, let say the $latex h_i(e)=i$ for $latex i=1,\cdots,k$. Let say if we have already $latex |S|$ entries inserted to the bloom filter, then $latex Pr(A(1)=1)=1-Pr(A(1)=0)=1-((n-1)/n)^{k|S|}$$latex \approx 1-e^{k|S|/n}$ for large $latex n$. Let us denote $latex b\triangleq n/|S|$ as the number of bit used per entry. Then, $latex Pr(A(1)=1)=1-e^{k/b}$ and the false positive rate is simply $latex Pr(A(1)=1\quad \&\quad A(2)=1 \quad\&\cdots \&\quad A(k)=1)=(1-e^{k/b})^k$.
Update on May 20, 2018
I came across reviewing a paper on Bloom filter and so went check out this note written years back. I always have difficulty remember the concept of Bloom filter. I guess mostly because I am from the Electrical Engineering (signal processing background) and Bloom filter is very much different from the “filters” I learned from in school. But looking from the original meaning of “filter”, i.e., trying to sift through something. After all Bloom filter is indeed a “filter”. I guess even more so than the filters in signal processing.
And as for “Bloom”, I originally thought that it has something to do with “broom”. No kidding. I am not a native speaker and I mixed up the two words for years and didn’t realize that. It turns out “Bloom” just came from the last name of the original creator.