-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for xxh3-64 #16
Comments
yes this is planned although I have preference for XXH128 because of its better resistance to collision. |
Hi! Teracopy is using now xxh3-64 as the default protocol for hash check, and I've seen people recommend it in other places too. Is xxh3-128 faster? In my personal case, I don't need the hash check for security purposes: I just need to find out if two files are identical or make sure a file stored in my hard drive or SSD isn't corrupted. |
Thank you for bringing Teracopy's use of xxh3-64 to my attention. Concerning the speed, xxh3-128 is slower than xxh3-64. The increased computation time is because it generates a longer hash value, specifically twice as long. While this does mean that operations are slower, it also significantly reduces the risk of collision. The risk of collision is directly related to the size of the hash. With a 64-bit hash, there are 2^64 different possible hash values. With a 128-bit hash, there are 2^128 different possible hash values. The difference between these two numbers is colossal. The more potential outputs a hash function can have, the less likely it is for two different inputs to produce the same output (a collision). While it's true that, in theory, a 64-bit hash could be used to identify unique data, in practice, the risk of collision is too high for most use cases. This is why hashing algorithms like MD5 produce a 128-bit hash. Despite being theoretically possible to find two different inputs that produce the same MD5 hash, the sheer number of possible hash values makes this highly unlikely. If you're just using a hash to check for file corruption, a 64-bit hash might be sufficient. However, if you're using the hash to uniquely identify a file or other data (which is often the case in data integrity checks and other similar uses), a 128-bit hash is generally preferable due to the significantly reduced risk of collision. That being said, in the spirit of flexibility and catering to diverse user needs, I plan to integrate both xxh3-64 and xxh3-128. This will empower users to select the hashing algorithm most suitable for their specific circumstances and requirements. |
Hi, @idrassi , Thank you for integrating both algorithms. For aficionados like me, hashing is a tricky issue. I struggle to make certain decisions, most notably the algorithm that best suits my needs. My use case for hashing is as follows:
Basically I'm not concerned about security, about the fact that somebody has replaced one of my files for obscure purposes. All my files are stored locally. So far, I've been using MD5. I didn't think about it much before I chose it, but it seemed more modern and safer than, for example, CRC-32 and faster than others. It looks like 64 bits generate a huge number of combinations, despite the fact that it's not remotely comparable with 128 bits. Speed is an issue for me. Checksums tend to take more time than reasonable, as I don't have portable SSDs, but just hard drives. I presume that the time taken by the calculation depends on CPU speed and the reading speed of your storage device, so having hard drives is my bottleneck. Anyway, I keep having second thoughts about the algorithm that makes more sense to me, so having all the possible options available is always great. Thanks again! |
Hi, Just a quick note to inform you that, after reading your comments and other information out there, I eventually switched to Blake3 instead of xxh3-64. In any case, having xxh3-64 available at some point would be great so users can choose the algorithm they need. |
@vivadavid, there's a substantiated tutorial by @jolynch on when it's appropriate to use xxHash and BLAKE3. Being a user, I've been thinking about this together with developers since the emergence of both algorithms, that is for years, and came to the same conclusion. Ideally, I should
But I see problems with that idealistic approach (as of December 2023):
Now, how do I explain to comrades that hashing is a useful trick in the baggage of worldly experience, like taking care of shoes after a walk on a snowy street sprinkled with anti-ice chemicals? (Happy New Year, btw.) I see with the shoes: wipe it down with a rag and put it out to dry, then it will last you longer. Giving similar easy-to-follow instructions about hashing and its benefits is still a challenge. At times it seems that hashing is never meant to be a desktop tool, but just a tech under the hood of other apps (backup, messengers, torrents, etc), and yet, I believe, it's a path worth exploring. |
Thanks for your message, @sergeevabc , and for providing the tutorial. So far, I've been using BLAKE3, even if my purpose is "to verify the integrity of non-sensitive data" because it's reliable and fast, but I'm always open to change. It's a very complex issue. |
Today I've decided to do some testing on Teracopy to compare xxh3-64 and Blake3. My results indicate that xxh3-64 is noticeably faster, so I'm switching back. I can't wait to use xxh3-64 on HashCheck whenever it's feasible! Until then, I'll stick to Blake3 as second best for my purposes. |
Hi,
I wanted to know if support for xxh3-64 could be added. Thanks!
The text was updated successfully, but these errors were encountered: