Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the BLAKE3 hashing algorithm to fn:hash #1226

Closed
wants to merge 42 commits into from
Closed

Add the BLAKE3 hashing algorithm to fn:hash #1226

wants to merge 42 commits into from

Conversation

dnovatchev
Copy link
Contributor

This PR adds the BLAKE3 hashing algorithm as one of the hashing algorithms in fn:hash

This comes from a different branch than the one that contains the PR for fn:ranks, thus both PRs must be active and independent of each other.

@ndw
Copy link
Contributor

ndw commented May 19, 2024

I think you've made quite a tangle here. I suggest you discard this PR and start over.

I don't know how you have your repository configured, I have an "upstream" repository that points to the QT4CG version and an "origin" repository that points to my version. I think maybe you've got something like that, but I can't tell.

  1. Check out the master branch (git checkout master)
  2. Update it (git fetch upstream; git rebase upstream/master)
  3. Make a new branch (git checkout -b blake3)
  4. Apply only the changes you want for the blake3 PR
  5. Push the branch (git push -u origin blake3)
  6. Create a new PR from that

That way you'll have independent PRs.

@ndw
Copy link
Contributor

ndw commented May 19, 2024

When I investigated Blake3, I found three different "flavors" of hashing, with different parameters. How does your proposal address those flavors and their parameters.

(e.g., https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/Blake3.html)

@dnovatchev
Copy link
Contributor Author

When I investigated Blake3, I found three different "flavors" of hashing, with different parameters. How does your proposal address those flavors and their parameters.

(e.g., https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/Blake3.html)

This proposal is just for the Hashing function of Blake3 - not for Keyed Hashing or Keyed Derivation. And this is what the fn:hash function needs to be doing - hashing.

For this reason there is no ambiguity here, and the added bullet explicitly specifies Blake3 Hashing:

"BLAKE3: the BLAKE3 algorithm defined by [BLAKE3 Hashing]."

@benibela
Copy link

who needs so many different hash functions?

@dnovatchev
Copy link
Contributor Author

I think you've made quite a tangle here. I suggest you discard this PR and start over.

I don't know how you have your repository configured, I have an "upstream" repository that points to the QT4CG version and an "origin" repository that points to my version. I think maybe you've got something like that, but I can't tell.

  1. Check out the master branch (git checkout master)
  2. Update it (git fetch upstream; git rebase upstream/master)
  3. Make a new branch (git checkout -b blake3)
  4. Apply only the changes you want for the blake3 PR
  5. Push the branch (git push -u origin blake3)
  6. Create a new PR from that

That way you'll have independent PRs.

Thanks Norm,

The reason for this is that the changes for the previous submitted PR (1027, for fn:ranks) are in my master branch. Then when I created the hashAddBlake3 branch from the master branch, it got all the changes from the previous PR, too.

Thus in the hashAddBlake3 branch I had to revert all those changes in order to start in a clean way.

It seems that a good way to fix this is that I should discard PR1027, refresh my fork to fully match the qtspecs repository, then make two separate (and from master) feature branches for fn:ranks and for the addition of Blake3 for fn:hash and resubmit the two PRs.

Do you see a better way to do this?

@ChristianGruen
Copy link
Contributor

who needs so many different hash functions?

I share the concerns that if we add BLAKE3 to the list, people may miss RIPEMD-160, SHAKE or Whirlpool (which, as far as I can judge, are still more popular today). Do we have good arguments for adding BLAKE3 to the most common 3 (+CRC-32) and not adding various others as well?

@ChristianGruen
Copy link
Contributor

Do you see a better way to do this?

I recommend you to create a fresh clone of the original repository and create new branches for fn:rank and BLAKE3. This will be even more helpful if you create future PRs in the future.

@dnovatchev
Copy link
Contributor Author

Do you see a better way to do this?

I recommend you to create a fresh clone of the original repository and create new branches for fn:rank and BLAKE3. This will be even more helpful if you create future PRs in the future.

Yes @ChristianGruen this is exactly what I have starting doing. Thanks!

@dnovatchev
Copy link
Contributor Author

Closing this PR to resolve a branches situation that causes a dependency between two different PRs.

Will reopen it after the issue is fixed.

@dnovatchev dnovatchev closed this May 19, 2024
@dnovatchev
Copy link
Contributor Author

@ChristianGruen (or anyone having this knowledge):

How can I delete my fork of qtspec in order to be able then to create a new, fresh one?

@ChristianGruen
Copy link
Contributor

@ChristianGruen (or anyone having this knowledge):

How can I delete my fork of qtspec in order to be able then to create a new, fresh one?

https://letmegooglethat.com/?q=GitHub+delete+repository

@dnovatchev
Copy link
Contributor Author

who needs so many different hash functions?

I share the concerns that if we add BLAKE3 to the list, people may miss RIPEMD-160, SHAKE or Whirlpool (which, as far as I can judge, are still more popular today). Do we have good arguments for adding BLAKE3 to the most common 3 (+CRC-32) and not adding various others as well?

@ChristianGruen , @benibela

The BLAKE3 hashing algorithm is at present the fastest and the most secure (no known breaches) hashing algorithm.
It has already good existing implementations in several different programming languages, including Java and C#.

If this is not a sufficient reason for you to consider it, then any further arguments are not meaningful.

Please, read the description and the detailed benchmarking results here: https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf. In particular, see the three diagrams on page 9 to page 11 summarizing the benchmarking results comparing BLAKE with SHA-256 and SHA-512. See also Table 4 and 5 summarizing the extremely low probabilities for successful attack using different known attack-methods.

Providing the best of the best hashing algorithm is a good opportunity for us and missing it would be unfortunate.

@ChristianGruen
Copy link
Contributor

If this is not a sufficient reason for you to consider it, then any further arguments are not meaningful.

You seem to have misread my question, which was what would be reasonable arguments against adding other algorithms like RIPEMD-160, SHAKE or Whirlpool as well.

@dnovatchev
Copy link
Contributor Author

If this is not a sufficient reason for you to consider it, then any further arguments are not meaningful.

You seem to have misread my question, which was what would be reasonable arguments against adding other algorithms like RIPEMD-160, SHAKE or Whirlpool as well.

@ChristianGruen,

I share the concerns that if we add BLAKE3 to the list, people may miss RIPEMD-160, SHAKE or Whirlpool (which, as far as I can judge, are still more popular today). Do we have good arguments for adding BLAKE3 to the most common 3 (+CRC-32) and not adding various others as well?

Where/when did I say that I have any arguments for not adding other algorithms?

@benibela
Copy link

@dnovatchev

The BLAKE3 hashing algorithm is at present the fastest

I found 155 faster hashs: https://rurban.github.io/smhasher/doc/table.html

Better add all of them

@ChristianGruen

You seem to have misread my question, which was what would be reasonable arguments against adding other algorithms like RIPEMD-160, SHAKE or Whirlpool as well.

That would have been much easier if it was an EXPath module rather than a core function. Then all kinds of hash functions could be added with needing so many arguments

@ChristianGruen
Copy link
Contributor

@ChristianGruen

That would have been much easier if it was an EXPath module rather than a core function. Then all kinds of hash functions could be added with needing so many arguments

I agree. The EXPath initiative is somewhat dormant as its founder is not responsive anymore. My personal preference would be to stick with the existing set of 4 hash functions, and let implementors decide which other hash algorithms to add.

@ndw Do you have an example for an option argument in mind that has more keys than algorithm? It would be nice to add one to the spec.

@ndw
Copy link
Contributor

ndw commented May 21, 2024

I observe that implementations can support any and all hash algorithms that their users ask for. The only thing that's really on the table here is which algorithms must be supported by conforming implementations.

You can take the position "everything I want must be supported by all conforming implmeentations" but it's not the easy path. Every feature you ask an implementor to add, no matter that it's small, is another feature they have to implement, and test, and document.

FWIW, XProc's p:hash step mandates CRC32, MD5, and SHA1. My XProc implementation also supports HMAC and (because @dnovatchev intrigued me, all three flavors of BLAKE3). I could (easily) add more flavors, my implementation would remain conformant, and no specifications would have to be updated.

@ndw
Copy link
Contributor

ndw commented May 21, 2024

@ChristianGruen the p:hash step uses parameters to allow other flavors of SHA and MD5. Support for HMAC requires an accessKey parameter.

@ndw
Copy link
Contributor

ndw commented May 21, 2024

If we adopted a position that was more open to optional features (loved by implementors, hated by users), we could divide things up. We could, for example, have a separate specification for the fn:hash function and implementors could choose whether or not to adopt it. XProc decided to do this at least partly so we could cross the finish line.

@ChristianGruen
Copy link
Contributor

@ChristianGruen the p:hash step uses parameters to allow other flavors of SHA and MD5. Support for HMAC requires an accessKey parameter.

Thanks. Maybe we could add an example to the spec that is expected to work in future versions of Saxon (adding a note that it is vendor-specific). Something like this?

hash("input", { "algorithm": "HMAC", "accessKey": "secret" })

@ndw
Copy link
Contributor

ndw commented May 21, 2024

I've implemented in my XProc processor, not Saxon (yet 😄 ), but yes, that would be fine. I wonder if algorithm should be a QName in this case...

@ChristianGruen
Copy link
Contributor

I've implemented in my XProc processor, not Saxon (yet 😄 ), but yes, that would be fine. I wonder if algorithm should be a QName in this case...

True – the optional parameters would need to have a prefix. Something like:

hash("input", { "algorithm": "HMAC", xs:QName("vendor:accessKey"): "secret" })

@ndw
Copy link
Contributor

ndw commented May 21, 2024

I was thinking that the algorithm should be a union of string and QName...

@ChristianGruen
Copy link
Contributor

I was thinking that the algorithm should be a union of string and QName...

I think we have no rule yet that treats option values differently, depending on their type (but I may be wrong?).

My favorite would still be to separate the algorithm from the options, and have no rules imposed on the last argument:

fn:hash(
  $value      as (xs:string | xs:hexBinary | xs:base64Binary)?,	
  $algorithm  as xs:string?                                      := 'MD-5',
  $options    as map(*)?                                         := ()
)  as xs:hexBinary?

fn:hash('x', 'CRC-32') would be much nicer to read, and completely sufficient.

@michaelhkay michaelhkay added the Abandoned PR was rejected, withdrawn, or superseded label Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Abandoned PR was rejected, withdrawn, or superseded
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants