Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting check sums created #15

Open
orbittwz opened this issue Jan 30, 2023 · 5 comments
Open

Sorting check sums created #15

orbittwz opened this issue Jan 30, 2023 · 5 comments

Comments

@orbittwz
Copy link

When creating a check sum, the files aren't in order...
This has been discussed before, but hasn't been fixed yet.

@redactedscribe
Copy link

The trick is to first make sure that your file manager is sorted A-Z, and secondly, after selecting the files you want to hash, right-click the first selected file and then choose Create Checksum File. This will always produce a checksum with A-Z ordering.

It does make sense however that HashCheck should sort the items to be hashed before hashing them so that the checksum would always be A-Z ordered no matter the user's order of actions. In fact, I think it may already do this, it's just that right-clicking any other selected item but the first one causes this bug (@idrassi).

@idrassi
Copy link
Owner

idrassi commented Aug 10, 2023

HashCheck hashes files based on the order they are received from Windows Explorer. But when operating on an SSD, HashCheck utilizes multithreading to enhance the hashing speed by processing multiple files concurrently. This, however, means that the output order can vary depending on factors like file size and thread completion times.

Concerning the suggested workaround of sorting in Explorer, it will work only on traditional HD and it will have no effect when multithreading is used.

Based on this, the potential solutions are:

  • Disable Multithreading: This would slow down the computation, which might not be preferable for users, especially with large datasets.
  • Post-Processing Sorting: After all checksums are calculated, we could sort the entries in the checksum file. This ensures a consistent order with minimal performance overhead since the sorting process is much quicker than hashing.

Given the trade-offs, the second option seems to be more efficient and user-friendly. This will ensure that the checksum file entries are sorted without compromising the hashing speed.

I will further evaluate the feasibility of this solution and I will let you know.

@orbittwz
Copy link
Author

Hey!
Thanks for answering so thoroughly but the question, does the program actually does it?
Being said, I already went back to OpenHashTab since the sorting problem has been solved also there...
Maybe you can get some inspiration or contact the author for help with your program?
https://github.com/namazso/OpenHashTab

@redactedscribe
Copy link

redactedscribe commented Aug 10, 2023

Given the trade-offs, the second option seems to be more efficient and user-friendly. This will ensure that the checksum file entries are sorted without compromising the hashing speed.

This would be ideal. A sort before write shouldn't be any noticeably slower.

HashCheck hashes files based on the order they are received from Windows Explorer. But when operating on an SSD, HashCheck utilizes multithreading to enhance the hashing speed by processing multiple files concurrently.

Also, it doesn't look like there's any indication when HashCheck is hashing in parallel on an SSD, but it may be useful to communicate this. Either in the README or on the UI. I personally haven't noticed HashCheck changing its hashing behaviour between HDD/SSD, but now knowing this I'll be using my SSDs.

Additionally, an indication of how many files have been hashed out of the total to be processed would be good, e.g. "(5/16)" or "(1/1)" for a single file in the title bar.

@ZPNRG
Copy link

ZPNRG commented Oct 9, 2024

@idrassi Thank you for your comment above explaining the SSD/multithreading issue at work. I've used HashCheck since 2009 (the original Kai Liu version) and then eventually switched to Gurnec's fork in 2016 before switching to your fork in 2021. The checksum files being created with a very inconsistent order has been quite irritating, but I wasn't sure what was the cause of it and why sometimes it happened and sometimes it didn't. Knowing that the drive being used when they are created (SSD vs. spinning HD) is at play fully explains the results I see all the time.

I definitely vote for the "Post-Processing" option so that the order of files in the checksum file created is the same as the order of the files on disk that the checksum file is created from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants