Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add video post duplication detection support with videohash #303

Open
akamhy opened this issue Jun 3, 2022 · 3 comments
Open

Add video post duplication detection support with videohash #303

akamhy opened this issue Jun 3, 2022 · 3 comments

Comments

@akamhy
Copy link

akamhy commented Jun 3, 2022

Nice bot, I came across your bot's comment on some subreddit and I noticed that it lacks video support.

I am @akamhy and I am the creator of videohash, a Near Duplicate Video Detection python library. I would like to know if you are interested in supporting video posts duplication detector with the videohash library?

@barrycarey
Copy link
Owner

Hello,

I hadn't seen your library before but that looks like it would work really well. I had put together a solution in the past the generated hashes of a set of frames. However, it didn't scale well.

How does video hash do the comparison for lookup? The database of hashes would likely be over 100 million videos. I'm sure I could plug it into my solution for images but would be interested in another approach.

@akamhy
Copy link
Author

akamhy commented Jun 3, 2022

How does video hash do the comparison for lookup? The database of hashes would likely be over 100 million videos. I'm sure I could plug it into my solution for images but would be interested in another approach.

Similar to ImageHash, it(videohash) calculates the hamming distance of 64 bits to differentiate videos. So the time required to query a videohash and imagehash should be similar. It should be identical to what you are doing with ImageHash.

Possible areas you should check before using it in production are the hashing time(too slow for your usage?) and collisions(too many collisions?). Also I ready to make changes to the library for making it more suitable for this particular use case.

Maybe you should try it out on some sample videos and suggest some changes iff required to the library.

@barrycarey
Copy link
Owner

I'm only using imagehash to create the hashes. I'm using a different solution for comparison since directly doing hamming distance didn't scale. However, it looks like I can do exactly the same thing with video hash.

I should be able to test it out in the next couple weeks. I'm pretty limited on time right now

I appreciate the heads up, I had no idea this existed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants