Locality Sensitivity Hashing (LSH)

I am studying an online class on Web Intelligence and Big Data. Dr. Shroff described a very nice example of LSH. Below are some quick notes.

The idea of LSH is essentially the same as dimension reduction. We have some data vector $latex x$ that can be in very high dimension. By processing hash data $latex f(x)$ instead, we can match $latex x$ with another data vector $latex y$ much more efficiently. Of course, to make this possible, we need to have $latex f(x) \approx f(y)$ if and only if $latex x \approx y$.

The example Dr. Shroff gave was for the application of fingerprint matching. Let say we have two finger print images $latex x$ and $latex y$. We will partition the images into grid cells and will try to find out if a grid cell contains some minutiae (major features). Let say we will arbitrary pick $latex k$ same locations from both prints $latex x$ and $latex y$. Denote $latex p$ as the probability that a minutia exists. We will define $latex f(\cdot)$ as

$latex f(x) = 1$ if minutiae exists in all $latex k$ locations of print $latex x$;
$latex f(x) = 0$ otherwise.

If print $latex y$ is from the same subject, the chance that a minutia exists in $latex y$ if it exists in $latex x$ should be pretty high. Let us denote this probability as $latex q$.

Then, we have $latex P[f(x) = f(y) = 1] = (pq)^k$. If we let $latex p = 0.2$ and $latex k = 3$, we have $latex P[f(x) = f(y) = 1]=0.006$, which is rather small.However, if we repeat this measurement, say, $latex b=1024$ times, the probability that at least having one $latex f'(\cdot)$ such that $latex f'(x)=f'(y)=1$ is pretty high. Actually, it is equal to $latex 1-(1-(pq)^k)^b=0.997$ in this case.

In contrast, if we have another print $latex z$ that is not from the same subject, the probability having one match on $latex f(x)=f(z)=1$ is only

$latex 1-(1-(p\times p)^k)^b=0.063$. Therefore, the false positive is rather small. As seen in the plot below, the effectiveness of the LSH relies on the rather rapid change of the expression $latex 1-(1-(pq)^k)^b$ as $latex q$ changes.

Leave a Reply

Your email address will not be published. Required fields are marked *