Perceptual hashing is the name for a class of algorithms that attempts to produce a fingerprint of image data (or other multimedia). Unlike a checkshum or cryptographic hash, though, a perceptual hash is a fuzzy hash – the perceptual hash of two similar images should be similar.
There are many cases where you might want to do this – digital forensics, copyright enforcement, space reduction in databases, and so on.
No paper on this (the ones I know of rely on fairly advanced image-processing-specific domain knowledge); instead I asked you to read a couple web pages to give you an overview.
They all work on something like the same basis: We’re trying to extract and summarize meaningful “features” from the image. Fine details are higher-frequency (more rapid changes pixel-to-pixel), whereas broad features are lower frequency. So the simple methods described in the reading all take advantage of that.
Average and Difference hashing
The first pair of algorithms to consider work as follows.
First, decolorize the image (change to greyscale, extract the Y component, whatever). Now you only have 8 bits per pixel instead of the 24 needed for RGB. Most information humans care about is in this channel, as we saw when we did YCrCb decomposition during our dive into JPEG.
Then, downsample / scale the image to a smaller size (dependent upon the size of the perceptual hash you want). Let’s say you want a 64-bit hash, then for aHash, 8x8; for dHash, 9x8 (we’ll see why in a moment.
Now, for aHash: compute the average value of the 64 bytes. Then create a 64-bit hash as follows. For each byte of the image, set the corresponsing bit in the hash to 0 if it’s value is less than the average, or 1 otherwise.
For dHash: For each of the eight rows in the image, consider each byte. If the byte at position x is less than the byte at position (x+1), set the next bit in the hash equal to 1, otherwise 0. (You need nine values per row to have eight differences per row, hence the 9x8 size).
”… the resulting hash won’t change if the image is scaled or the aspect ratio changes. Increasing or decreasing the brightness or contrast, or even altering the colors won’t dramatically change the hash value. Even complex adjustments like gamma corrections and color profiles won’t impact the result.”
Haven’t we seen a way (there are others) to extract the frequencies from an image before? You can do this with a DCT, too! The algorithm proposed here is similar.
Reduce color and size, as before. But this time have a larger small image (author proposes 32x32). Remember, DCT will let us zero in on the low frequencies – you want to DCT a matrix larger than 8x8 if you ultimately want a 64-bit hash value based on the 64 lower-frequency values post-DCT. Author chose 32x32 as a compromise (DCT does take more time on larger matrices).
So you have a 32x32 matrix – compute the DCT.
Then reduce the DCT. The author keeps the upper-left 8x8 values, but this is actually a bad plan for two reasons, only one of which he caught. The first is that the upper-left value is just the solid-color information and should be ignored. The second is that, as you may recall, you probably want to take the upper-left triangle, not square, of values, if what you want is to maximize the low-frequency information you’re collecting.
Then, it proceeds like aHash – compute the average value of these 64 values, and set the bit in the hash corresponding to each to 0 if it’s less than the average, or 1 otherwise.
So, dHash is way better than aHash, just hands down. I speculate this is because it captures features more clearly (local relative changes in brightness) that would bemissed by aHash.
pHash is better still, but slower. (But the author’s initial implementation was flawed, and pHash does not need to be as slow as it first appeared…)
PhotoDNA is a more advanced version of this system, but the details have never been published.