increase hashing buffer size #1357

i512 · 2024-10-01T18:40:47Z

Hi guys, thanks for your project.
Full hashing was taking a lot of time on my HDD, and task manager reported read speed was kinda low, ~25MB/s so I suspected the small buffer might cause it.

Tested with 16MB buffer read speed increased to ~150MB/s which is what I would expect.
20 gb of duplicates is scanned in 3 minutes vs 13 on current release (without cache).

Interestingly if I run the same test on an NVMe then there's almost no difference: 17s vs 18s, ~2.4GB/s read speed.
I guess windows is reading small blocks from the drive without doing a lot of readahead. Increasing the buffer is kind of a simple fix.

qarmin · 2024-10-04T20:00:35Z

This is a little strange. Using stack should be faster than allocating each time buffer on heap(but probably hdd is bigger bottleneck here).
Previous version used here 128KB array, but there was a lot of stack overflow so I decreased value to fix problems.

This may improve performance for bigger files, but I doubt that allocating 16MB of memory even for small 5KB files will not decrease performation in such situation.

To verify that a benchmark for function hash_calculation should be created.

I think that mixed solution, based on file size, should works better, but without clear benchmark with different parameters, we cannot be sure.

i512 · 2024-10-05T11:52:35Z

Sorry, I'm new to rust, didn't realize buffer would be allocated each time. That's not good.

I think that mixed solution, based on file size, should works better, but without clear benchmark with different parameters, we cannot be sure.

This can work, but it will require tuning, yes.

I think the usual approach is to reuse the buffers between jobs. I can't figure out how to do this yet though. Tried creating buffers in map_init, but I did not work as I expected. So I'm probably going to close this for now.

Reducing the number of threads to 1 (in current release with 16kb buffer) actually improves the read speed with big files in my case. Oh I see that this was already mentioned in #835 nice.

i512 marked this pull request as draft October 1, 2024 18:41

i512 force-pushed the enlarge-hashing-buffer branch from ace9a9e to fd135b4 Compare October 1, 2024 19:15

increase hashing buffer size

1a920c9

i512 force-pushed the enlarge-hashing-buffer branch 2 times, most recently from 2a83fe3 to 1a920c9 Compare October 3, 2024 21:07

i512 marked this pull request as ready for review October 3, 2024 21:40

i512 closed this Oct 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

increase hashing buffer size #1357

increase hashing buffer size #1357

i512 commented Oct 1, 2024 •

edited

Loading

qarmin commented Oct 4, 2024

i512 commented Oct 5, 2024

increase hashing buffer size #1357

increase hashing buffer size #1357

Conversation

i512 commented Oct 1, 2024 • edited Loading

qarmin commented Oct 4, 2024

i512 commented Oct 5, 2024

i512 commented Oct 1, 2024 •

edited

Loading