Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparency of the project #113

Open
bleichenbacher-daniel opened this issue Apr 8, 2024 · 9 comments
Open

Transparency of the project #113

bleichenbacher-daniel opened this issue Apr 8, 2024 · 9 comments

Comments

@bleichenbacher-daniel
Copy link

Could we please have more transparency what is happening with this project to avoid unnecessary and parallel work.

At this point I have rewritten a significant part of the test vector generation for Wycheproof, just so that I can continue with the project. Without some coordination even more time might be wasted, which could be used for solving new issues instead. At this point I don't know who is interested in the project and what potentially outstanding issues are.

For example it would be helpful to have just a list of cryptographic primitives and their status (i.e. whether they are supported, whether test vectors have been tested or which party might be interested to use them)

@FiloSottile
Copy link
Member

We're still ramping up a maintenance framework for the project, so I think what comes across as a lack of transparency is really work in progress. For context, until end of March we worked on getting the handoff finalized, and we're still working on getting the generators published, for example.

What I can tell you is that most if not every major cryptographic implementation is interested in consuming the vectors, and some are interested in contributing. I have not heard requests for specific primitives besides ML-KEM and ML-DSA, for which we have some in-progress contributions. There is definitely interest in defining a new reusable format for test vectors, although I don't expect that to be something that works out in the span of a few weeks.

We had a session at OSCW to talk about what implementers want from test vector libraries. I am not sure it's super easy to follow in video, since there's a lot of audience participation, but you might be interested in the recording https://archive.org/details/oscw-2024-fillippo-valsorda-cryptographic-test-vectors or I can send you a transcript.

There is consensus around making the Wycheproof project not just a source of test vectors, but a repository where different sources/people/projects can pool vectors, so that downstreams can use them all at once. We'll work over the next few weeks to make it easier to contribute vectors and to consume them.

You're very welcome to send any new vectors. If you're worried about duplicating work, maybe open an issue to announce what you are working on, and then close it with the PR that submits those vectors?

Note that since we intend to accept vectors from multiple sources, we can't rely on regenerating them all when changing formats or adding new ones, but we will have to port the old ones, and iteratively add new ones.

@bleichenbacher-daniel
Copy link
Author

Maybe I wasn't clear enough. When I left Google a year ago two managers independently asked me if I'm willing to continue the project. I also received some non-committal promises that I might get access to my generator code. Hence I've continued working on the project. Now we have two parallel projects. This is obviously not ideal. Hence it would make sense to have a meeting to clear things out. What worries me most is that I have worked on the project a over a decade. Hence I don't want to lose the project a second time.

Thanks for the link to the video. I have a few comments:

  • There are already tests comparing the test vectors to the JSON schemas. The JSON schemas and the documentation of the test vector formats are generated from the same source, so that they would not fall out of sync. I don't know how to set this up on github however.
  • You talked about test vectors with intermediate values. In most cases these should be relatively easy to generate. Another option would be code that guesses the location where an error occurred. Most of the code I had there were colabs. If this kind of stuff is of interest then maybe it would be possible to recover these colabs (or rewrite them. They are probably less than 1000 lines of code)
  • You talked about testing the test vectors. One issue that needs to be discussed are test vectors with unclear states. An example are test vectors with modified private key. Here it is unclear if a crash with a modified private key is a vulnerability or not, since in most cases users modifying their own key means that they are just shooting themselves into the foot. However if private keys can be uploaded to an HSM, then crashes do matter. For such tests it is important to have a way to gain consensus whether libraries need strong private key validations or not. A big question here is how to decide what checks a library should perform when importing keys, or performing similar functions that are difficult to attack. I have generated a relatively large number of faulty keys. They have not been published exactly, since I don't know the expectations.
  • Test vector format: if we want to change the format of the test vectors then it would make sense to tackle this now before making big announcements.
  • Data structures for various languages: I think it should be possible to generate the data structure from the same source that generates the JSON schema and documentation.

@FiloSottile
Copy link
Member

I also received some non-committal promises that I might get access to my generator code.

I'm trying to enable that!

Hence it would make sense to have a meeting to clear things out.

Sure! I still don't have an email address for you, but you can reach out at [email protected] and we can set up a call.

I want to be upfront: the goal of C2SP, my own intention, and the community's interest is in growing Wycheproof into a repository for (properly attributed!) test vectors from multiple sources. I think what you worked on can fit perfectly, but I want to be clear it's not the same single-source design of Wycheproof-at-Google.

There are already tests comparing the test vectors to the JSON schemas. The JSON schemas and the documentation of the test vector formats are generated from the same source, so that they would not fall out of sync. I don't know how to set this up on github however.

Happy to do the GitHub Actions setup.

I'm not sure I see the autogenerated documentation, where is it?

You talked about test vectors with intermediate values. In most cases these should be relatively easy to generate. Another option would be code that guesses the location where an error occurred. Most of the code I had there were colabs. If this kind of stuff is of interest then maybe it would be possible to recover these colabs (or rewrite them. They are probably less than 1000 lines of code)

Intermediate values are useful while developing an implementation, so I am not sure they make sense in the same format/place as the rest, but they would definitely be useful. Maybe they fit in the more "free-form" part of the repository we talked about at OSCW.

You talked about testing the test vectors. One issue that needs to be discussed are test vectors with unclear states.

When I talk about testing the test vector I just mean making sure they were generated correctly given their intention, so we can just write implementations that pass/fail based on the "acceptable" state.

Here it is unclear if a crash with a modified private key is a vulnerability or not, since in most cases users modifying their own key means that they are just shooting themselves into the foot.

Heh, this is a whole topic among implementers and different libraries take different views. I think it would make sense to have them, maybe with a specific flag/in specific files, to let libraries decide if they fit the threat model.

Test vector format: if we want to change the format of the test vectors then it would make sense to tackle this now before making big announcements.

I would rather take our time to gather community feedback on the new format. For now, I want to get us set up with refreshed docs, and the tooling to smoothly add and consume vectors in the current v1 format.

Data structures for various languages: I think it should be possible to generate the data structure from the same source that generates the JSON schema and documentation.

Generating them is indeed easy, but knowing what the right data structure is requires language-specific knowledge that I don't have across all languages. For now I think just making the JSON available is a good first step.

@bleichenbacher-daniel
Copy link
Author

bleichenbacher-daniel commented Apr 8, 2024

The auto generated file I talked about is
https://github.com/C2SP/wycheproof/blob/master/doc/types.md
Unfortunately, this is an old version, which is sad because the main goal was to generate doc and schemas from the same source, then test the schemas against the test vectors, which should ensure that the documentation reflects the test vector files. Of course if that gets out of sync, then nothing is gained.

Yes, feedback would be nice. One thing I could do is generate some sample test vector files with a new format, just for discussion. An issue about the about the test vector format is open #106. So far there are no comments yet.

@bleichenbacher-daniel
Copy link
Author

I'm wondering if the upcoming Eurocrypt would be an opportunity for some small meeting.

For example discussing the following topics would be helpful for the project:

  • Organization: So far I've rewritten about 80'000 lines of code for the test vector generation. A significant fraction might not have been necessary with a clearer organization of the project. Hence, it might be useful to consider how duplicate work can be avoided in the future.

  • Priorities: To have some impact it would be very useful to determine if there are important algorithms or primitives that are missing.

  • Tools: One thing I noticed is that the test vectors are often converted to other formats. It might be helpful to provide some tools to support such conversions.

  • Annoyances that make the project harder to use than necessary.

@bleichenbacher-daniel
Copy link
Author

Friendly ping.

It is quite difficult to make efficient progress with the project without coordination. Given that this is mainly a project done in free time any kind of duplication should be avoided.

@bleichenbacher-daniel
Copy link
Author

Are there any updates to what is happening?
The lack of transparency is very damaging to the project.
Without coordination it is not possible to progress efficiently.
The fact that we still have no access to the source code for the test vector generation, documentation and unpublished tests, as well as the uncertainty about their status also significantly impacts any further progress.

@jedisct1
Copy link

From an external perspective, it appears that this project has been abandoned, with no updates for over nine months despite facing issues and submitting pull requests.

Rooterberg has already become more useful for implementers due to its extensive range of supported constructions. It’s unfortunate and confusing that there are two projects with the same objectives but no collaboration between them.

@bleichenbacher-daniel
Copy link
Author

I'm sorry about the situation. I'm also not aware of any activity. Either nothing happens or I'm not being informed about such activity.

Rooterberg is to a large degree an implementation of changes discussed here. I have to use a different repo, since I do not have access to this one here.

I certainly do have plans to continue the project. One direction I'm working on are distributed review systems. I.e. such systems try to collect code reviews from different sources and accumulate them into some sort of trust score. I currently have a moderate set of tests (mostly for rust and javascript libraries) that could be used for such purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants