Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

– Adding the BLAKE3 hashing algorithm to fn:hash #1228

Merged
merged 13 commits into from
Sep 3, 2024

Conversation

dnovatchev
Copy link
Contributor

This is a resubmission of the original #1226.
No new changes, this is fixing a pure git-technical issue.

Now the PR is submitted from a dedicated feature-branch and does not depend on any other branch

@ChristianGruen ChristianGruen changed the title Adding the BLAKE3 hashing algorithm to fn:hash – Adding the BLAKE3 hashing algorithm to fn:hash May 20, 2024
@ndw
Copy link
Contributor

ndw commented May 21, 2024

There was a lot of discussion in comments on the previous PR so it's worth looking there if you haven't already.

@ndw
Copy link
Contributor

ndw commented May 21, 2024

There were two threads in today's meeting that I'd like to comment on: 1. cryptographic security and 2. why are CRC32, MD5, SHA1, and SHA256 good choices for standardization?

  1. I don't think the QT stack is likely to be used to implement a cryptographic library. I don't think having a (currently and likely temporarily) secure algorithm in the mix is, in and of itself, a benefit. I'm not opposed, but I think a more likely use of a hashing function in QT is to generate a "good enough" signature for doing comparisons and to interoperate with existing signatures.
  2. CRC32 is used by ZIP and MD5, SHA1, and SHA256 are commonly used as validation measures on the web. "Here's a download, and here's the MD5|SHA1|SHA256 checksum that validates that the bits you get are the ones I posted". In practice, MD5 is a bad choice in modern times, but not all of the software out there is modern. I bet I can find a CD or two lying around my office that have files with MD5 published signatures. Being able to interact with those using QT is valuable. In short: CRC32, MD5, SHA1, and SHA256 are widely enough deployed to justify inclusion.

Finally, I'm going to repeat a point that I tried to make during the meeting. The existence of a freely available implementation of an algorithm (especially in several languages) is a really good thing. It improves the likelihood of interoperablity and helps assure that it's not just one organziation's toy project. But that doesn't make adding it to your project "free". Every dependency has a cost. It has to be vetted, tests have to be written against it, documentation has to be updated. And it's another thing to manage in every release and another vector for bugs and errors.

None of these observations are really about BLAKE3 in particular. BLAKE3 looks pretty good. But does it need to be in the standard? More than SHA512? More than HMAC? More than something else? Implementations are free to support as many hashing algorithms as they like.

The argument that QT won't be reliably interoperable if BLAKE3 isn't included cuts both ways. It's true, if it's required for conformance it's more likely to be in every implementation. But if it isn't required for conformance, and consequently some implementations choose not to support it, doesn't that suggest it wasn't deemed necessary by a broad enough spectrum of users?

@ChristianGruen
Copy link
Contributor

None of these observations are really about BLAKE3 in particular. BLAKE3 looks pretty good. But does it need to be in the standard?

I agree. We've just ported the Rust reference implementation to Java. It is lightweight, fast and has no dependencies, and we’ll be glad to provide it in our implementation, but this alone should be no reason to add it to the standard. Maybe one year later everyone will use another hashing algorithm as default, we cannot tell.

@dnovatchev
Copy link
Contributor Author

I am not willing to argue on this.

I have done my best for the inclusion of the most efficient and most secure hashing algorithm - that's it.

Failing to give the users the best would only weigh on your conscience - not on mine.

@ChristianGruen
Copy link
Contributor

I am not willing to argue on this.

Same for me; let’s the majority decide.

Failing to give the users the best would only weigh on your conscience - not on mine.

Just in case, I’ll try to cope with it ;)

@ndw
Copy link
Contributor

ndw commented May 24, 2024

The assertion that BLAKE3 is "the most efficient" hashing algorithm seems to disregard @benibela 's pointer to 155 faster algorithms.. And "most secure" is, of course, only temporary. There are dozens of hashing algorithms, maybe hundreds, nothing prevents implementations from supporting all of them.

I think CRC32, MD5, SHA1 and SHA256 are justified on the basis that they're very widely used. I think BLAKE3 is a good, modern choice.

I don't object to including it, but what do we say when someone proposes another, and another, and another after that?

@dnovatchev
Copy link
Contributor Author

I think CRC32, MD5, SHA1 and SHA256 are justified on the basis that they're very widely used. I think BLAKE3 is a good, modern choice.

I don't object to including it, but what do we say when someone proposes another, and another, and another after that?

I think a good reasonable answer would be what you said exactly above: "We included CRC32, MD5, SHA1 and SHA256 on the basis that they're very widely used. And BLAKE3 is a good, modern choice. We are providing 5 algorithms, 5 is a good, still reasonably small number, let us not go beyond that".

@ChristianGruen ChristianGruen added the Tests Needed Tests need to be written or merged label May 28, 2024
@wendellpiez
Copy link

First, please correct me if we have not agreed CRC32, MD5, SHA1 and SHA256 are to be included as useful even if not always ideal for every use case?

And the outstanding questions:

  • Should BLAKE3 be added to this list?
  • Should any other algorithms be added now? (in view especially of costs?)
  • How do we manage the list going forward?

IIUC, to help with this we would like second and third informed opinions:

  1. Is BLAKE3 the obvious choice for 'good and modern' especially in view of known limits of CRC32, MD5, SHA1 and SHA256?
  2. Are there any others comparable in both excellence and low cost (in implementation, use and dependency management) we should be considering?
  3. Alternatively, can we do without such a 'good and modern' option, or is it less critical than one might think, and should be left in the 'optional for implementers' category?

If the answer to that last question leans toward 'yes', then maintaining the list (of algorithms required to be supported) presumably becomes easier. (We could stipulate essentially that it is up to implementers to stay 'good and modern' while the mandated list is there only for interoperability.)

Comments?

@ChristianGruen
Copy link
Contributor

Comments?

@wendellpiez Sounds good to me, thank you.

@dnovatchev
Copy link
Contributor Author

First, please correct me if we have not agreed CRC32, MD5, SHA1 and SHA256 are to be included as useful even if not always ideal for every use case?

I think it was agreed (or at least not rejected) based on the fact that SHA1 and SHA256 have been widely used, regardless of their known security vulnerabilities.

And the outstanding questions:

* Should BLAKE3 be added to this list?

Blake3 is an example of a modern hashing algorithm with no known security vulnerability and with exceptional efficiency in speed, space and parallelization. Also this algorithm has available implementations in the most popular programming languages (such as Java and C#) and has already been implemented by most of the implementors in the community-group (Norm and Christian).

Clearly - we don't need to decide that exactly this algorithm should be included - we just need to reserve a slot for one algorithm that has similar qualities - with no known security vulnerability and with exceptional efficiency in speed, space and parallelization. We owe our users that at least one up-to-date, secure and efficient algorithm be provided to them, regardless which implementation they happen to be using.

* Should any other algorithms be added now? (in view especially of costs?)

Reserving a 5th slot to the 4 already specified is not a big increase in cost, taking into account that the implementors needed only a short time to implement BLAKE3.

The cost to the users if no modern, secure and quality hashing algorithm is provided, can however be huge.

* How do we manage the list going forward?

We decide that we only provide 5 hashing algorithms that every implementation must make available.

Any implementation may decide to provide additional hashing algorithms.

IIUC, to help with this we would like second and third informed opinions:

1. Is BLAKE3 the obvious choice for 'good and modern' especially in view of known limits of CRC32, MD5, SHA1 and SHA256?

It is the obvious reference to the category of a modern, secure and efficient hashing algorithm that we need to have. Any other algorithm belonging to this category would be OK.

2. Are there any others comparable in both excellence and low cost (in implementation, use and dependency management) we should be considering?

And if there are such, which would be recommended?

3. Alternatively, can we do without such a 'good and modern' option, or is it less critical than one might think, and should be left in the 'optional for implementers' category?

We owe the user community including at least one such good option. If any implementor thinks only about minimizing their own costs, they probably will not be contributing with anything to this group.

If the answer to that last question leans toward 'yes', then maintaining the list (of algorithms required to be supported) presumably becomes easier. (We could stipulate essentially that it is up to implementers to stay 'good and modern' while the mandated list is there only for interoperability.)

Comments?

@wendellpiez
Copy link

wendellpiez commented May 29, 2024

@dnovatchev given your remarks, would you be amenable to shortening the list of questions to:

  • Four or five algorithms? (Assuming the four to be CRC32, MD5, SHA1 and SHA256)
  • If five, should the fifth algorithm be BLAKE3 or some other?

We would not now make any plans for others or indeed for extensibility going forward. (Not that the question could never be reopened, and of course implementers can provide additional options.)

If you agree that these are the only questions -- and so do others -- the committee's job becomes simpler, and so does consulting with expertise, specifically to confirm your sense that BLAKE3 is the best (modern) choice available.

If we can't add BLAKE3 without considering others as well, indeed that is a bigger problem. (I had assumed that due diligence more or less requires us to do so, but your argument refutes that assumption.)

@wendellpiez
Copy link

wendellpiez commented May 29, 2024

Actually @dnovatchev I don't want to leave out another option, addressing your observations:

  • Saying that implementors 'should' (or even 'must') provide a fifth algorithm but leave it up to them which one to select?

Or is this not something that should be on the table (in your view or others')?

Personally I think this would be just confusing, but it seems to be consistent with your position.

@dnovatchev
Copy link
Contributor Author

@dnovatchev given your remarks, would you be amenable to shortening the list of questions to:

* Four or five algorithms? (Assuming the four to be CRC32, MD5, SHA1 and SHA256)

I think this should not be questionable -- the right question is:

"Should we provide to the users at least one modern hashing algorithm without known security and quality flaws, or not?"

* If five, should the fifth algorithm be BLAKE3 or some other?

I am OK with BLAKE3 and it was me who proposed it, but if none of us are experts in the field, and we don't feel sure (I do feel sure) then let us ask a well-known expert for recommendation.

We would not now make any plans for others or indeed for extensibility going forward. (Not that the question could never be reopened, and of course implementers can provide additional options.)

If you agree that these are the only questions -- and so do others -- the committee's job becomes simpler, and so does consulting with expertise, specifically to confirm your sense that BLAKE5 is the best (modern) choice available.

If we can't add BLAKE5 without considering others as well, indeed that is a bigger problem. (I had assumed that due diligence more or less requires us to do so, but your argument refutes that assumption.)

I do not understand this last comment. What is my "your argument" and what does it refute?

I personally proposed including BLAKE3, and will be OK with such a decision.

In case people are not sure which hashing algorithm from this category to include, then it is best to have an expert's recommendation.

@dnovatchev
Copy link
Contributor Author

Actually @dnovatchev I don't want to leave out another option, addressing your observations:

* Saying that implementors 'should' (or even 'must') provide a fifth algorithm but leave it up to them which one to select?

Or is this not something that should be on the table (in your view or others')?

No, we need to provide one specific algorithm - not leave it to the implementors.

The goal is that a user will know that this algorithm is provided on any compliant implementation.

Personally I think this would be just confusing, but it seems to be consistent with your position.

@wendellpiez
Copy link

I was only remarking that I was wrong to assume that if the list was open to one new entry (on the basis of 'modernity', security and the rest), it should be open to more.

However you seem to be of the definite opinion that one is enough, assuming it is BLAKE3 or better. (And I am not going to stand on a principle you do not agree with, at least this time. :-)

Thank you, you are answering my questions!

@dnovatchev
Copy link
Contributor Author

@wendellpiez Please, let us discuss this between the two of us offline as you keep hypothesizing what my position on this or that question is, and this hypothesizing fills up space

@wendellpiez
Copy link

@dnovatchev feel free to send any clarifications in email, I'll read. It is certainly not my intent to make you 'own' opinions that are not yours.

But I'm not concerned here to clarify every point, especially where we agree. Where we disagree we need clarification so we can come to agreement, but a certain amount of confusion is also acceptable. (Maybe I don't expect Perfect Understanding since every time I've thought I had it, I turned out to be wrong. I'm still happy to read email.)

@wendellpiez
Copy link

In effect I'm offering that if we can limit the decisions to "which one, if any?" we'll make it easier for ourselves.

Reasons we can't limit the decision would include things like 'there's no obvious single choice, even BLAKE3" and "we need more than a list, we also need to future-proof the problem".

Adding a function to report the available algorithms is an attempt to future-proof the problem. So would an extensibility model - which no one has proposed, and may not be feasible anyway.

And 'least effort' is definitely a concern here for any present and future implementation -- not because the effort would not pay off (let's assume it would) but because total cost becomes a barrier.

My hope here is to have the simple questions that the crypto person can answer. Maybe we allow room for more than five - even then we get

  1. CRC32, MD5, SHA1 and SHA256 only
  2. CRC32, MD5, SHA1, SHA256 and BLAKE3
  3. CRC32, MD5, SHA1, SHA256 and some other
  4. CRC32, MD5, SHA1, SHA256, BLAKE3 and some other(s)

The crypto expert would presumably help some of us choose between 2, 3 and 4 here, while others are also considering 1 and don't know which of 2, 3 or even 4 might be best -- while they do want to hear the expert on that question.

Is this fair?

@dnovatchev
Copy link
Contributor Author

My hope here is to have the simple questions that the crypto person can answer. Maybe we allow room for more than five - even then we get

1. CRC32, MD5, SHA1 and SHA256 only

2. CRC32, MD5, SHA1, SHA256 and BLAKE3

3. CRC32, MD5, SHA1, SHA256 and some other

4. CRC32, MD5, SHA1, SHA256, BLAKE3 and some other(s)

I think that the only questions we have (that are realistically feasible) are 2 and 3 above, and question 3 actually subsumes question 2.

@michaelhkay
Copy link
Contributor

One interesting idea might be to provide an option "secure" which generates a signature using whatever algorithm the implementation considers to represent the state-of-the-art for a secure signature. Of course the signatures produced by different implementations would be different so this is only useful for cases where signature generation and signature verification are done using the same product.

…n-hash-blake3

Synching with other merges
@wendellpiez
Copy link

Update June 6 - I have put out a couple of lines with inquiries, but so far got nothing. Still listening.

…n-hash-blake3

Merging the latest changes from master
…n-hash-blake3

Merging the latest changes from master
@ndw
Copy link
Contributor

ndw commented Jul 17, 2024

(As an independent member of the community group and very definitely not in my role as co-chair...)

While I'm sympathetic to the idea that a standard, modern hashing function would be a good thing, we are obviously not experts on the subject. As has been demonstrated, there are dozens of possible algorithms of which BLAKE is only one possible choice. It is likely that the choice we make will be seen as an endorsement of the algorithm we choose and I think that should give us substantial pause. We attempted to get input from the security community who are experts and could not get that input.

At the end of the day, not everything has to be standardized. There is nothing that prevents an implementation from supporting BLAKE (or City32, CityCrc128, FNV2, FarmHash128, HalfSipHash, HighwayHash64, MeowHash, Murmur3A, PMPML_32, SipHash13, Spooky128, TSip, aesni, ahash64, bernstein, crc64_hw, discohash2, edonr256, falkhash, jodyhash32, k-hash64, metrohash128, mirhash, pearsonbhash64, pengyhash, poly_3_mersenne, prvhash64s_128, seahash, sumhash32, superfast, umash32, or xmsx32 to name just 32 that I picked more-or-less at random from the 155 that are by some measure faster than BLAKE from the list presented earlier).

I think we should accept that without the input of the security community, we don't have the necessary expertise to endorse BLAKE. And we don't need to. Any implementation can implement it, or any other hashing function.

I suggest that we abandon this PR and let users demands for quality implementations guide implementors.

@dnovatchev
Copy link
Contributor Author

Thank you, Norm for commenting on this.

There were two threads in today's meeting that I'd like to comment on: 1. cryptographic security and 2. why are CRC32, MD5, SHA1, and SHA256 good choices for standardization?

1. I don't think the QT stack is likely to be used to _implement_ a cryptographic library. I don't think having a (currently and likely temporarily) secure algorithm in the mix is, in and of itself, a benefit. I'm not opposed, but I think a more likely use of a hashing function in QT is to generate a "good enough" signature for doing comparisons and to interoperate with existing signatures.

Blake3 was proposed not only because it has uncompromised security, but together with this, because of its lightning speed.
So, it was proposed for the combination of efficiency and security.

2. CRC32 is used by ZIP and MD5, SHA1, and SHA256 are commonly used as validation measures on the web. "Here's a download, and here's the MD5|SHA1|SHA256 checksum that validates that the bits you get are the ones I posted". In practice, MD5 is a bad choice in modern times, but not all of the software out there is modern. I bet I can find a CD or two lying around my office that have files with MD5 published signatures. Being able to interact with those using QT is valuable. In short: CRC32, MD5, SHA1, and SHA256 are widely enough deployed to justify inclusion.

Finally, I'm going to repeat a point that I tried to make during the meeting. The existence of a freely available implementation of an algorithm (especially in several languages) is a really good thing. It improves the likelihood of interoperablity and helps assure that it's not just one organziation's toy project. But that doesn't make adding it to your project "free". Every dependency has a cost. It has to be vetted, tests have to be written against it, documentation has to be updated. And it's another thing to manage in every release and another vector for bugs and errors.

None of these observations are really about BLAKE3 in particular. BLAKE3 looks pretty good. But does it need to be in the standard? More than SHA512? More than HMAC? More than something else? Implementations are free to support as many hashing algorithms as they like.

The argument that QT won't be reliably interoperable if BLAKE3 isn't included cuts both ways. It's true, if it's required for conformance it's more likely to be in every implementation. But if it isn't required for conformance, and consequently some implementations choose not to support it, doesn't that suggest it wasn't deemed necessary by a broad enough spectrum of users?

Let me repeat the proposal clearly:

We need to have in the 5th slot at least one modern hashing method that is both secure and efficient.

The fact is that we didn't receive any response from the specialists in this area about a better method. No one told us at the same time that there is something bad about BLAKE3 and we shouldn't use it.

What is more constructive to do in this situation is not to abandon entirely the good idea, but to have the 5th slot filled, maybe temporarily, with the currently proposed method - BLAKE 3, based on the fact that it completely reflects our goals for this 5th slot, and the specialists who have been contacted didn't signal otherwise. We can always replace it in the future with an even better method, if we have some new, objective data or a specialist's opinion.

So, let us have this 5th slot filled with what we have so far, and in the time between now and the official release of the spec we can always replace it with an even better choice, if we find one.

@dnovatchev
Copy link
Contributor Author

I think we should accept that without the input of the security community, we don't have the necessary expertise to endorse BLAKE. And we don't need to. Any implementation can implement it, or any other hashing function.

Blake3 was proposed not only because it has uncompromised security, but together with this, because of its lightning speed.
So, it was proposed for the combination of efficiency and security.

I suggest that we abandon this PR and let users demands for quality implementations guide implementors.

Let me repeat the proposal clearly:

We need to have in the 5th slot at least one modern hashing method that is both secure and efficient.

The fact is that we didn't receive any response from the specialists in this area about a better method. No one told us at the same time that there is something bad about BLAKE3 and we shouldn't use it.

What is more constructive to do in this situation is not to abandon entirely the good idea, but to have the 5th slot filled, maybe temporarily, with the currently proposed method - BLAKE 3, based on the fact that it completely reflects our goals for this 5th slot, and the specialists who have been contacted didn't signal otherwise. We can always replace it in the future with an even better method, if we have some new, objective data or a specialist's opinion.

So, let us have this 5th slot filled with what we have so far, and in the time between now and the official release of the spec we can always replace it with an even better choice, if we find one.

@ndw
Copy link
Contributor

ndw commented Jul 18, 2024

I am not persuaded that we need another option. I understand that you believe we do, but I haven't found your arguments persuasive. I encourage implementors to support BLAKE3 and other hashing functions that users may find useful. My initial reaction to the idea was "okay, sure, adding something modern sounds reasonable". (In fact, I implemented it the p:hash step of my XProc implementation because it was an amusing evening's hacking to do so.)

What changed my mind about adding it to our fn:hash function, I think, was the observation that adding BLAKE3 is an endorsement from this community group that BLAKE3 is the right, best choice if you want something modern. I certainly lack the expertise to make that endorsement. The fact that we reached out to the security community and they were also unwilling to make an endorsement should give us pause, I think.

And I fall back to the quality of implementation argument. Not everything has to be mandated by the standard to be useful. Where specifications allow, implementors can (and will!) add features and capabilities that make their implementations more appealing to users. Our specification gives implementors the freedom to support additional algorithms and if users demand, implementors will.

@wendellpiez
Copy link

My feeling here is a little split.

I think Dimitre as always makes excellent points, but as I understand it, we are not preventing any implementation from offering any algorithm. We are only defining requirements for conformance.

Where I differ from Norm is in his characterization of my feeble attempts so far as having provided "the input of the security community". I asked a couple of people that were convenient to ask, getting more or less a shrug or a non-answer, or what's worse: "it's complicated". And that in turn leads me to agree with Norm's reservation that we not be seen to be endorsing or recommending anything in particular, or at least anything remotely cutting-edge.

One person pointed me to a survey of algorithms that was outdated (2011), on Wikipedia (so not hard to find), and long.

If the question is to be answered only with further inputs from knowledgeable parties, I suggest we float it in places where we are likely to hear anything at all - maybe Slack channel, mailing lists etc. Mainly since we don't know what we would learn, if anything.

If the question is to be answered based only on what we know so far, I think looking at it from the users' point of view makes the most sense.

What the user wants (IMV) is something reliable and well-documented, along with options. Since they have options and since implementors are already motivated to provide more, I think I'm probably with Norm on balance - the requirement is not strong enough to prioritize above other items. Implementors can offer the feature, then advertise its superiority. The only thing that users lose is a measure of interchangeability without prior agreement.

I'm sorry I wasn't able to strike gold with an answer here - I think our choices now are (a) continue discussing, including in broader venues, or (b) put it to a vote. Happy to hear other options.

@ChristianGruen
Copy link
Contributor

I liked the proposal from Reece to offer an additional function that returns the names of all supported algorithms. It would also be helpful to discover algorithms that get popular after the finalization of the 4.0 specification.

@wendellpiez
Copy link

👍 Very much supportive of that idea as well as it supports the 'discoverability' requirement. Thanks for mentioning, @ChristianGruen

@dnovatchev
Copy link
Contributor Author

What changed my mind about adding it to our fn:hash function, I think, was the observation that adding BLAKE3 is an endorsement from this community group that BLAKE3 is the right, best choice if you want something modern.

Not at all. We do not claim that the algorithm is "the right, best choice if you want something modern". This is just one of about 10-12 algorithms that fall into this category.

The emphasis here is not on "BLAKE3". I would accept any one of these 10-12 algorithms and would be happy if we make the choice using a roulette.

What is really important is that we have provided the user with at least one such modern, efficient and secure algorithm.

I strongly believe that we must be acting in the best user interests - to be user - advocates.

I certainly lack the expertise to make that endorsement. The fact that we reached out to the security community and they were also unwilling to make an endorsement should give us pause, I think.

So, if one is starved to death and has a choice of a dozen delicious foods, is he going to die because he cannot determine which is the best of the dishes, or is he going to make a random choice in order to stay alive?

Why does this remind us of ... Buridan's ass ? 😄

@michaelhkay
Copy link
Contributor

How important is it that the "modern, efficient and secure algorithm" is the same algorithm (producing the same hash) on all implementations? Might some users like an option that selects the implementor's choice of algorithm, which might change over time?

@ndw
Copy link
Contributor

ndw commented Jul 18, 2024

@dnovatchev you've made your position, that "the CG must provide a modern, efficient, and secure algorithm," very clear.

I am unconvinced. It isn't that I fail to understand your position. It isn't that I think your proposal is incorrect purely on its technical merits (notwithstanding the fact that I'm unqualified to judge the relative merits of BLAKE3 over other possible choices).

Any algorithm that we mandate for conformance will be perceived as an endorsement of that algorithm as superior to other algorithms that we could have chosen. I am not persuaded that the benefits of endorsing BLAKE3 (or some other choice) outweigh the risks associated with making that endorsement.

So as not to malign the BLAKE3 algorithm, let's say we pick a different one. "Rhubarb" for example. We say that conformance requires that you implement CRC32, MD5, SHA1, and Rhubarb. The world says "ooh, Rhubarb, that's the one for me, that's what the XML community recommends." Then in 2025 someone discovers a critical vulnerability in Rhubarb. What then? We've encouraged the deployment of an insecure algorithm and produced a specification that requires you to support Rhubarb in order to claim conformance.

What users are you advocating for? What use cases do you think cannot be satisfied unless an additional algorithm is required for conformance? I'm not disputing that implementors should provide other choices, and I think it's very likely that they will. All that's really at issue here is the requirement that a particular selection be made by this CG and elevated to the status "endorsed and required for conformance."

I don't think the benefits outweigh the risks. Simply repeating your position won't persuade me otherwise.

With respect to the analogy that I might starve to death surrounded by a dozen delicious foods, I think it's an inapt analogy. A more apt analogy would be that you think I'll starve unless every single restaurant is required to offer me lobster thermidor in addition to everything else already on the menu: a sandwich, chicken nuggets, and a Ceaser salad. That's obviously not the case. I'll happily eat the lobster thermidor from one restaurant and the beef Wellington from another. Or, you know, sandwiches if they don't offer a special.

Anyway. I'm not going to lie down in the road and prevent the working group from adding more options if that's how consensus goes. I just don't think, on balance, it's the right choice.

@dnovatchev
Copy link
Contributor Author

How important is it that the "modern, efficient and secure algorithm" is the same algorithm (producing the same hash) on all implementations? Might some users like an option that selects the implementor's choice of algorithm, which might change over time?

This would be important for all developers who produce products that are intended to be portable across brands and implementations.

@dnovatchev
Copy link
Contributor Author

With respect to the analogy that I might starve to death surrounded by a dozen delicious foods, I think it's an inapt analogy. A more apt analogy would be that you think I'll starve unless every single restaurant is required to offer me lobster thermidor in addition to everything else already on the menu: a sandwich, chicken nuggets, and a Ceaser salad. That's obviously not the case. I'll happily eat the lobster thermidor from one restaurant and the beef Wellington from another. Or, you know, sandwiches if they don't offer a special.

It is very simple to explain what I mean with this analogy.

But first, a quote from Wikipedia:

"This makes the MD5, SHA-1, RIPEMD-160, Whirlpool, and the SHA-256 / SHA-512 hash algorithms all vulnerable to this specific attack. "

So, all that we offer on our restaurant menu is a subset of these rotten foods...

Don't we need to offer at least one dish that is known to be healthy?

Sometimes inaction (such as preventing or stopping bad things from happening) can cause bad things to happen (or to continue going on).

@wendellpiez
Copy link

Dimitre, I'm not sure any algorithm has been or even can be shown always to be healthy, especially not today. (I'm not a mathematician but I am a bad poet, resisting the impulse to offer more analogy hot off the griddle.) It's an abstruse point but it speaks to hesitations around endorsements and second-order effects (concerns I share).

Besides, even if we agree on what's healthy and not, the dishes will be on offer in any case. It's only whether this particular delicacy must be supported and conformantly (to its own spec) in order to meet the requirement you state for 'blind interchange' of this data. (Especially given that the string or binary being hashed is already very opaque.)

I know you take it as obvious that if a processor says they have algorithm X, that's indeed what it is. I'm afraid these days I'm a little dubious. And we haven't even talked about testing ... so we rely on implementers not to lie or even fudge.

We could reject the feature simply on the grounds of parsimony, a principle I have heard you advocate for. I'm a bit surprised no one has moved that we vote and move along! :-) In any case, I appreciate the patience.

My own thinking has been clarified by Norm's point that any choice will be inevitably be received as a recommendation -- sore experience teaches us the risks of that on top of the costs.

And this is true even if you are right about all the benefits, as you may well be.

A function listing available algorithms would help implementations compete and (implicitly) coordinate, and lay the groundwork for more and better choices later, with more than rudimentary testing. That is not nothing.

@dnovatchev
Copy link
Contributor Author

Dimitre, I'm not sure any algorithm has been or even can be shown always to be healthy, especially not today. (I'm not a mathematician but I am a bad poet, resisting the impulse to offer more analogy hot off the griddle.) It's an abstruse point but it speaks to hesitations around endorsements and second-order effects (concerns I share).

Besides, even if we agree on what's healthy and not, the dishes will be on offer in any case. It's only whether this particular delicacy must be supported and conformantly (to its own spec) in order to meet the requirement you state for 'blind interchange' of this data. (Especially given that the string or binary being hashed is already very opaque.)

I know you take it as obvious that if a processor says they have algorithm X, that's indeed what it is. I'm afraid these days I'm a little dubious. And we haven't even talked about testing ... so we rely on implementers not to lie or even fudge.

Very good point, Wendell!

I cannot provide a too-general answer, but at least for the case of the BLAKE3 algorithm, it is straightforward to verify the correctness of any implementation. Just use the BLAKE3 Hashing online page.

image

It is also interesting to use the "Free Password Hash Cracker" hashing a password that is known to be already "cracked" and passing to the tool the hash of this string created with one of the following hashing methods:

LM, NTLM, md2, md4, md5, md5(md5_hex), md5-half, sha1, sha224, sha256, sha384, sha512, ripeMD160, whirlpool, MySQL 4.1+ (sha1(sha1_bin)), QubesV3.1BackupDefaults

Here is an example: the SHA256 hash of "ali5" is: "3a51210b15b8350337cd0cde92bc63dff06528bad7554529e5491974f174a8c1"

One can generate this hash using this tool: "SHA256 Online Tool":

image

Then feed this so generated hash to the "Free Password Hash Cracker" and ... voila:

image

Hope this addresses your concern? Or Not?

@Arithmeticus
Copy link
Contributor

Having read this thread, I lean gently toward not requiring BLAKE3.

I am so grateful that 4.0 actually has any hash, checksum, and CRC algorithm. And I think that in the time between now and 5.0 we'll gain deeper perspective. In the meantime, there's nothing preventing processors from adding whatever algorithms they like. And maybe BLAKE3 weaknesses will have been exposed, saving us some face. Doesn't the track record shows us that it's not a case of if a cryptographic algorithm will be broken, but when?

But I'm still persuadable. Pass this thread around to your favorite cryptography expert and invite their input. We need it.

@wendellpiez
Copy link

Esteemed Dimitre and colleagues,

I appreciate your response -- and I am glad to learn more about BLAKE3, algorithms, and testing -- and yet I'm afraid it doesn't address the point. Indeed, the way it misses the point, is revealing.

These arguments could be compelling if the counter-proposal were to ban BLAKE3 or all algorithms except an approved list. The counter-proposal is to let vendors offer BLAKE3 with their other supported values, both required and optional.

And it has already been pointed out that no arguments about BLAKE3 or about any particular algorithm address the reservation stated. In that context, my remark about testing doesn't apply to BLAKE3 in particular as much as to the entire problem space. This is about "knowability" for our users.

As I see it the best argument for including it is that if we include X, all users can use X with confidence in its portability. But the proportion of users for whom this aspect of portability across processors really matters, is probably fairly small, even if I like the principle. It's just not possible to pave the universe.

And since the alternative is to leave it to implementers (and to users to request), the costs of not including it are either minor, or temporary, or both.

Unless some very well informed person comes to tell us that yes, we need another choice and yes, BLAKE3 is the choice (with new information to corroborate and confirm Dimitre's finding), I think I am leaning against the idea so far. Not because it is a bad idea but because we do not know it is a good idea today.

Thanks for reading! I am still learning about crypto.

@dnovatchev
Copy link
Contributor Author

Unless some very well informed person comes to tell us that yes, we need another choice and yes, BLAKE3 is the choice (with new information to corroborate and confirm Dimitre's finding), I think I am leaning against the idea so far. Not because it is a bad idea but because we do not know it is a good idea today.

Hey people, what's really up with you?

I have repeated many times - again and again that I am not particularly insisting on one specific algorithm, such as BLAKE3, but just that our "restaurant" should offer at least one food that is not rotten - just one food.

In case you are so proud of offering only food that has been proven to be rotten - well, go with it...

Or, step back to Joel's original proposal which included only MD5.

Anyway, there is something rotten ...

@ChristianGruen
Copy link
Contributor

I have repeated many times - again and again that I am not particularly insisting on one specific algorithm, such as BLAKE3, […]

I haven't participated too actively in this discussion, but I was surprised to read this, as the title and the contents of this PR are about BLAKE3. Maybe it would be easier if we could look at an updated PR?

@wendellpiez
Copy link

Dimitre, or maybe an alternate PR that includes only those algorithms you feel should be included?

Letting the tools implementers decide and permitting them to coordinate informally does not seem to me an inconsistent position to take. I do not recall whether we discussed criteria for inclusion on the list (beyond MD5 or at all) - maybe that was discussed, but it seems to be part of what you have been calling into question.

Maybe we should consider shortening the list, giving the implementers more options. even if leaving the interoperability in their hands as well?

@dnovatchev
Copy link
Contributor Author

Dimitre, or maybe an alternate PR that includes only those algorithms you feel should be included?

Letting the tools implementers decide and permitting them to coordinate informally does not seem to me an inconsistent position to take. I do not recall whether we discussed criteria for inclusion on the list (beyond MD5 or at all) - maybe that was discussed, but it seems to be part of what you have been calling into question.

Maybe we should consider shortening the list, giving the implementers more options. even if leaving the interoperability in their hands as well?

Wendell,

Whatever we do, it doesn't feel right to make obligatory only hashing algorithms that have known issues and not specifying at least one hashing algorithm that is not (known to be in a considerable past period) vulnerable and is highly efficient.

This is like being the "Federal Department of Restaurants" and obliging all restaurants in the country to provide 5 kinds of food that are known to be rotten, and telling them: "You can also provide a healthy food, but this is not obligatory".

Whatever we do, if all algorithms we specify as obligatory, have known vulnerabilities, this is not good.

As a user, if this is the case, I will never use any of the vulnerable hashing methods, and therefore this function.

@michaelhkay
Copy link
Contributor

I think the primary purpose of providing hash() was to allow people to verify signatures that are widely deployed on the web, and from that perspective the current list is a good one, because these are all in common use.

I do have a lot of sympathy with the argument that we should also offer something newer and better, and that it should be standard and interoperable. I think the WG might well agree that is a desirable aim. Our problem, I think, is a feeling that we lack the expertise to pick a winner, and that picking something that turns out to be a loser is worse than not picking one at all.

Perhaps a compromise might be to include BLAKE3 but with a caveat, something like "BLAKE3 is included in the list of hashing algorithms because at the time of writing it appears to be a promising candidate as a secure and fast algorithm that shows signs of gaining widespread support. However, this is a fast moving field and the community group recognizes that this decision might not stand the test of time. Implementations are therefore free to drop support for this algorithm and substitute another that appears to better meet requirements as the technology evolves."

@ndw ndw merged commit 9895610 into qt4cg:master Sep 3, 2024
2 checks passed
@dnovatchev dnovatchev deleted the dn-hash-blake3 branch September 17, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tests Needed Tests need to be written or merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants