The final hash pass shouldn't re-read the bit of data already hashed in the header phase #12

ssokolow · 2014-08-20T23:20:48Z

Given that the algorithm will only ever subdivide groups and the generated hashes are never used outside fastdupes, there's no harm in seek()ing past the first HEAD_SIZE bytes when doing the final full-content comparison and it might provide a tiny bit of speed-up in some situations.

More importantly, for files HEAD_SIZE or smaller, it means that we should be able to skip the final pass completely if the internal data structures preserve the file size read by the first pass, which eliminates the overhead of at least one syscall.

The text was updated successfully, but these errors were encountered:

ssokolow added enhancement labels Aug 20, 2014

ssokolow pushed a commit that referenced this issue Aug 20, 2014

Move more TODOs into issues #12, #13, and #14.

c9d59b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The final hash pass shouldn't re-read the bit of data already hashed in the header phase #12

The final hash pass shouldn't re-read the bit of data already hashed in the header phase #12

ssokolow commented Aug 20, 2014

The final hash pass shouldn't re-read the bit of data already hashed in the header phase #12

The final hash pass shouldn't re-read the bit of data already hashed in the header phase #12

Comments

ssokolow commented Aug 20, 2014