Perceptual hashing algorithm is a general term for a class of algorithms, including aHash
, pHash
, and dHash
. As the name suggests, perceptual hashing does not calculate hash values in a strict way, but rather computes hash values in a more relative way, because "similarity" or not is a relative decision.
algorithm | hash | speed | accuracy |
---|---|---|---|
aHash | average hash | fast | low |
pHash | perceived hash | slow | high |
dHash | difference value hash | high | high |
Steps to compare images with dHash algorithm
-
The fastest way to remove high frequencies and detail is to shrink the image. 72 pixels works best, so 9x8 is ideal dimensions
-
Reduce the color of the image. Convert the image to a grayscale picture. This changes the hash from 72 pixels to a total of 72 colors.
-
Compute the difference. The dHash algorithm works on the difference between adjacent pixels. This identifies the relative gradient direction. In this case, the 9 pixels per row yields 8 differences between adjacent pixels. Eight rows of eight differences becomes 64 bits.
-
Assign bits. Each bit is simply set based on whether the left pixel is brighter than the right pixel. The order does not matter.
If the Hamming distance is less than 5, it is the same image.
I use dHash algorithms to find reposts in the subreddits I moderate.
calculate the dHash value of an image
hash = dhash.calculate_hash(image)
calculate the hamming distance between two images
hamming_distance = dhash.hamming_distance(image1, image2)
calculate the hamming distance between two dHash values
hamming_distance = dhash.hamming_distance(dHash1, dHash2)
dhash.py is released under MIT license