Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The final hash pass shouldn't re-read the bit of data already hashed in the header phase #12

Open
ssokolow opened this issue Aug 20, 2014 · 0 comments

Comments

@ssokolow
Copy link
Owner

Given that the algorithm will only ever subdivide groups and the generated hashes are never used outside fastdupes, there's no harm in seek()ing past the first HEAD_SIZE bytes when doing the final full-content comparison and it might provide a tiny bit of speed-up in some situations.

More importantly, for files HEAD_SIZE or smaller, it means that we should be able to skip the final pass completely if the internal data structures preserve the file size read by the first pass, which eliminates the overhead of at least one syscall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant