vhash is a C++ reimplementation of videohash for detecting near-duplicate videos. It takes any input video or image file and generate a 64-bit equivalent hash value.
- A C++ compiler supports C++14
- CMake >= 3.11
- opencv for image decoding & resizing
- ffmpeg for video decoding & frame extracting
- fftw for discrete cosine transform (DCT)
- sqlite3 for file hash value caching
- spdlog for logging
CentOS
sudo yum install opencv-devel ffmpeg-devel fftw-devel sqlite-devel spdlog-devel
Ubuntu
sudo apt install libopencv-dev libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev libswscale-dev
sudo apt install libfftw3-dev libsqlite3-dev libspdlog-dev
macOS
brew install opencv@4 ffmpeg@5 fftw sqlite spdlog
brew link ffmpeg@5
- wavelib for wavelet decomposition
- sqlite_orm for file hash value caching
- cpptqdm for tqdm like progress bar
- CLI11 for command line parsing
git clone https://github.com/helloall1900/vhash.git
cd vhash
make
bin/vhash hash tests/testdata/lena.png
- googletest for unit testing
- google benchmark for benchmarking
CentOS
sudo yum install gtest-devel google-benchmark-devel
Ubuntu
sudo apt install libgtest-dev libbenchmark-dev
macOS
brew install googletest google-benchmark
- Generate hash value of single file or files in directory.
- Store file's hash value in db cache to speed up hash generation.
- Find duplicate video or image files in directory.
Generating hash for video or image files
Usage: vhash hash [OPTIONS] path
Positionals:
path TEXT:PATH(existing) REQUIRED file or directory path
Options:
-h,--help Print this help message and exit
-e,--ext TEXT ... file extension filter (i.e. -e mp4,mkv)
-c,--cache TEXT cache file or url
-o,--output TEXT output file
-C,--use-cache use cache
-r,--recursive recursively find files
-P,--no-progress not print progress bar
bin/vhash hash -C -o hash.txt some_dir_path
Operating on hash cache
Usage: vhash cache [OPTIONS] [path]
Positionals:
path TEXT full file path
Options:
-h,--help Print this help message and exit
-c,--cache TEXT cache file or url
-f,--find find cache item
-d,--del delete cache item
-C,--clear clear all hash cache
-p,--pure pure expired hash cache
-P,--pure-period INT [604800] pure period in seconds
bin/vhash cache -f some_file_path
Finding duplicate video or image files
Usage: vhash dup [OPTIONS] [path]
Positionals:
path TEXT:PATH(existing) file or directory path
Options:
-h,--help Print this help message and exit
-e,--ext TEXT ... file extension filter (i.e. -e mp4,mkv)
-c,--cache TEXT cache file or url
-o,--output TEXT output file
-C,--use-cache use cache
-r,--recursive recursively find files
-P,--no-progress not print progress bar
bin/vhash dup -C -o dup.txt some_dir_path
- videohash for video hash.
- imagehash for image hash.
- fastimagehash for C++ implementation of image hash.
Copyright (c) 2023 Leo. See LICENSE for details.