Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple hash algorithms (not SHA-1) #153

Open
timretout opened this issue Feb 5, 2023 · 4 comments
Open

Support multiple hash algorithms (not SHA-1) #153

timretout opened this issue Feb 5, 2023 · 4 comments
Labels

Comments

@timretout
Copy link

hash.c seems to produce SHA-1 digests.

SHA-1 is broken: https://shattered.io/ and for example NIST deprecated its use in 2011.

Rather than just upgrade to, say, SHA-256, it would be nice if the output format could record the type of algorithm alongside the checksum, to allow for future migrations.

@fvalasiad
Copy link
Collaborator

Nice to explore, I thought before of adding an option that allows one to choose the hashing algorithm and it might happen (why not)! But let me ask you.

I understand the fact that SHA1 is broken and that we can actually have collisions. But that's mostly a problem for cryptography uses of hashing algorithms! For us it's only potentially "bad" because two source files may randomly turn out to have the same hash, but what's the chance of this happening?

What I am trying to say is that we aren't expecting malicious users that try to find files that "break" build recorder. There is no point in doing that, nothing to win.

Again I am not denying your request, it's on the TODO list and you can even contribute towards making it reality. I just want to hear your thoughts on this.

@timretout
Copy link
Author

Yep, I get it. :)

Many large companies are interested in a "build recorder"-type approach for the software supply-chain security problem, where it would be nice to trace how binaries were originally compiled. These companies have adversaries with considerable resources, e.g. banks vs. organised crime.

These companies care about supply chain attacks - i.e. an employee of a supplier might try to supply a malicious binary to them, and to counter that they want to see how the binaries were built.

There are a few problems with SHA-1 in the above scenario:

  • with a moderate amount of computing power, it is now possible to create two binaries with the same SHA-1. So the attacker could create a "safe" binary with a recorded build chain, but replace it with a malicious binary at the end.
  • these large companies just do not accept software that uses SHA-1 for anything, so would not be allowed to use your software. 😕
  • in environments running in FIPS mode, OpenSSL has the SHA-1 algorithm disabled.

Git is an interesting special case - it uses a collision detection library (sha1cd) to identify known attacks on SHA-1; because the migration of git to new algorithms is tricky. But new applications should use stronger algorithms.

@zvr
Copy link
Collaborator

zvr commented Feb 11, 2023

We use the exact same computation used in SWHID (and git hash-object), as this is the most useful to refer to file contents in general. There are no plans to change this.

We could have options to also record other file data (including other content hashes).

I will leave this issue open for future exploration.

@zvr zvr added the future label Feb 11, 2023
@timretout
Copy link
Author

We use the exact same computation used in SWHID (and git hash-object), as this is the most useful to refer to file contents in general.

If you are looking to align with the output of git hash-object, then note that these days git uses the sha1dc library to detect and mitigate known collision attacks:

https://github.com/git/git/blob/a0789512c5a4ae7da935cd2e419f253cb3cb4ce7/sha1dc_git.h

Adopting this would give you better compatibility with git.

Tim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants