Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish checksums with releases #16165

Open
alexeagle opened this issue Mar 14, 2024 · 23 comments
Open

Publish checksums with releases #16165

alexeagle opened this issue Mar 14, 2024 · 23 comments

Comments

@alexeagle
Copy link
Contributor

@comius points this out in https://github.com/bazelbuild/rules_proto/pull/205/files#r1524512758

Currently users of protobuf can download releases from https://github.com/protocolbuffers/protobuf/releases - however they have no way to guarantee that the bytes they downloaded are the same that were published. A man-in-the-middle attack could tamper with the binary, for example, injecting a supply-chain-security vulnerability into the generated protobuf stub code.

Like many GitHub-released projects, there ought to be a checksums.txt file included as an additional release asset. This could be in the form of a .sha256-suffixed file for each release artifact, like https://github.com/astral-sh/ruff/releases or (more convenient IMO) a single checksums.txt file like https://github.com/google/yamlfmt/releases

@alexeagle alexeagle added the untriaged auto added to all issues by default when created. label Mar 14, 2024
@sgammon
Copy link
Contributor

sgammon commented Mar 15, 2024

@alexeagle I'm in the build doing some refactoring and fixing (#16176 etc)... did you want to take this on? If not, I'll file it along with some other dependency security enhancements (SLSA, Sigstore, etc), in a stack of PRs so they can merge what they feel comfortable with. I've taken the same approach with Guava.

@alexeagle
Copy link
Contributor Author

@sgammon that would be great, but I suspect the release machinery here is hidden in google3. It would be good to get a 👍🏻 from the maintainers on an approach for this.

@epicseven-cup
Copy link

Hi, I was just about to post this question as well any updates on this?

@esorot
Copy link
Contributor

esorot commented Jul 9, 2024

Thanks for filing this issue. We've added it to our backlog to be prioritized.

@zhangskz
Copy link
Member

zhangskz commented Oct 1, 2024

Wouldn't any attack that could compromise the binaries also be able to just tamper with the checksums as well, if they live side-by-side? I think we would generally recommend setting sha256 (e.g. bazel http_archive) directly to prevent this type of case. LMK if there's something we're failing to consider here!

This is probably still a nice-to-have backlog task, but we're unlikely to get to this atm unless the priority changes.

@alexeagle
Copy link
Contributor Author

alexeagle commented Oct 1, 2024

@zhangskz exactly, users should vendor the checksums to follow the principle of trust-on-first-use similar to host keys in a known_hosts file.

@JasonLunn JasonLunn removed the untriaged auto added to all issues by default when created. label Oct 2, 2024
@justfalter
Copy link

@alexeagle

exactly, users should vendor the checksums to follow the principle of trust-on-first-use similar to host keys in a known_hosts file.

From the linked article:

In the SSH](https://en.wikipedia.org/wiki/Secure_Shell) protocol, most client software (though not all[2]) will, upon connecting to a not-yet-trusted server, display the server's public key fingerprint, and prompt the user to verify they have indeed authenticated it using an authenticated channel.

It is not possible to perform Trust on First Use without having a means for verifying the authenticity of the information.

@zhangskz

Wouldn't any attack that could compromise the binaries also be able to just tamper with the checksums as well, if they live side-by-side? I think we would generally recommend setting sha256 (e.g. bazel http_archive) directly to prevent this type of case. LMK if there's something we're failing to consider here!
This is probably still a nice-to-have backlog task, but we're unlikely to get to this atm unless the priority changes.

This is why most (all?) linux distros cryptographically sign the software they release. The distro comes with a set of trusted GPG keys, and all packages have their signatures verified before installing.

The same is true for many open-source software projects. NodeJS publishes the GPG keys that are used to sign NodeJS releases.

The flow is generally:

  • Release-manager publishes their public GPG key.
  • Release-manager creates a new release of their software:
    • Has compiled binary example.linux-arm64.gz
    • Calculates sha256 of binary as example.linux-arm64.gz.sha256
    • Using their private GPG key, generates a detached signature of example.linux-arm64.gz.sha256 as example.linux-arm64.gz.sha256.asc.
    • Uploads example.linux-arm64.gz, example-linux-arm64.gz.sha256, and example.linux-arm64.gz.sha256.asc as assets on release.

In order to verify:

  • User verifies integrity of example.linux-arm64.gz.sha256 using example.linux-arm64.gz.sha256.asc and the known-publisher GPG key.
  • User verifies that sha256 of example.linux-arm64.gz matches value in example.linux-arm64.gz.sha256.

In order for an attacker to tamper, they would need to have compromised both the GPG key for the release-manager, as well as obtain privileged access to wherever the releases are hosted.

@sgammon
Copy link
Contributor

sgammon commented Jan 7, 2025

@justfalter This issue deals with checksum publishing for Protobuf releases for some channels where they are not available today. Most distribution channels for Protocol Buffers do provide checksums and signatures. For example, Maven-hosted protobuf.

It is not possible to perform Trust on First Use without having a means for verifying the authenticity of the information

"Trust on first use" implies no requirement for root of trust, and Wikipedia isn't an authoritative source anyway. In Maven's case, publishing authority for a given coordinate group is checked and enforced by Sonatype. Continuing with this example: if you trust your CA PKI to verify Maven Central, and therefore trust the signatures it provides as tamper-proof in transit, and also trust Sonatype to host Central securely and perform their verifications, then there is a root of trust established for Protocol Buffers consumed through Maven.

This issue deals largely with checksums published specifically for Protobuf releases on GitHub. Bazel users need the sha256 of a given dependency to pin it in their WORKSPACE file. However, Bazel is already deprecating the WORKSPACE file, having moved to BCR, which does distribute a checksummed version of Protobuf.

In most cases, Protobuf is "checkable," with a solid root of trust.

@justfalter
Copy link

@sgammon

This issue deals largely with checksums published specifically for Protobuf releases on GitHub.

FWIW, I am a bazel user who is very much interested in seeing sha256 checksums published for protobuf releases. When I read zhangskz's mention that binaries and checksums could be tampered with alike, I chose to inform on a common avenue for avoiding that situation.

In my opinion, there's a bit of XY problem going on here - the X seeming to be that protocolbuffers/protobuf provides no means for verifying the integrity of its posted release binaries. This applies to more than just bazel users.

@sgammon
Copy link
Contributor

sgammon commented Jan 9, 2025

@justfalter Google itself relies on protobuf and hash-locks protobuf in their own WORKSPACE files; is this not a valid root of trust, in your view? Google technically publishes those, just not here, in the repo. If you don't trust Google's own trust of Protobuf (or by extension, don't trust Google), how can you trust protobuf enough to use in your project in the first place?

What channel besides Bazel uses the source releases directly from the repository? Shouldn't Bazel users simply use BCR if they are concerned about security (again, assuming trust in Google since they run both BCR and the development of protobuf)?

@justfalter
Copy link

justfalter commented Jan 9, 2025

Google itself relies on protobuf and hash-locks protobuf in their own WORKSPACE files; is this not a valid root of trust, in your view? Google technically publishes those, just not here, in the repo. If you don't trust Google's own trust of Protobuf (or by extension, don't trust Google), how can you trust protobuf enough to use in your project in the first place?

You misunderstand me. This isn't about whether anyone trusts Google. This is about having a means for verifying that the binaries downloaded from https://github.com/protocolbuffers/protobuf/releases are the same ones that were originally uploaded by the release maintainers.

This repository has 178 releases created by at least 22 different Github accounts, and a compromise of one of those could allow an attacker to replace the binaries for any of the existing releases. It's true that the discussed TOFU approach would guard against binaries changing out from underneath them, but that doesn't cover for the total population of protobuf users (see below).

The hashes could be published out-of-band (ex: to https://protobuf.dev/ ?) as a guard against their being modified alongside the binaries.

What channel besides Bazel uses the source releases directly from the repository?

@sgammon
Copy link
Contributor

sgammon commented Jan 9, 2025

You misunderstand me.
It is not possible to perform Trust on First Use without having a means for verifying the authenticity of the information.

I am saying that Google's own WORKSPACE files are the root of trust you are looking for. If an SHA256 matches a release from this repository, then you know it is byte-for-byte identical with a release Google itself trusts; hence, if you trust Google, you can trust the release.

The many thousands of shell scripts fetching protocolbuffers releases using curl or wget
The nearly 6400 public repositories using the arduino/setup-protoc github action

Well, they shouldn't do that. They should probably add a hash. Maybe it would be a good idea to file there?

@justfalter
Copy link

justfalter commented Jan 9, 2025

I am saying that Google's own WORKSPACE files are the root of trust you are looking for. If an SHA256 matches a release from this repository, then you know it is byte-for-byte identical with a release Google itself trusts; hence, if you trust Google, you can trust the release.

Can you please share a link to these WORKSPACE files? I'm having a hard time finding them.

You do understand that Github releases and their assets can be modified, right? Google's WORKSPACE files do not magically prevent assets from being modified --- they might let Google know that they've been modified, but only if those assets are actually being pulled from the public Github repository.

@sgammon
Copy link
Contributor

sgammon commented Jan 10, 2025

Can you please share a link to these WORKSPACE files? I'm having a hard time finding them.

@justfalter Here is one; there are many.

You do understand that Github releases and their assets can be [modified?]

Trust On First Use would prevent unexpected changes to a GitHub repository within your dependency graph from going unnoticed. If the release changes, the hash will change; if you want a root to tie that trust to, one example would be from the searches above. I am merely suggesting this as an option for Bazel users who aren't using BCR, which, as I've said, provides all of this with a strong root of trust (including checksums).

[...] but only if those assets are actually being pulled from the public Github repository.

They are. You can see for yourself:

http_archive(
    name = "com_google_protobuf",
    sha256 = "b8ab9bbdf0c6968cf20060794bc61e231fae82aaf69d6e3577c154181991f576",
    strip_prefix = "protobuf-3.18.1",
    urls = gcs_mirror_url(
        sha256 = "b8ab9bbdf0c6968cf20060794bc61e231fae82aaf69d6e3577c154181991f576",
        url = "https://github.com/protocolbuffers/protobuf/releases/download/v3.18.1/protobuf-all-3.18.1.tar.gz",
    ),
)

Google's WORKSPACE files do not magically prevent assets from being modified

WORKSPACE files don't prevent changes to releases, of course, but they do hold a hash that would become invalid if releases change. WORKSPACE files are also deprecated. So long as we are discussing Bazel users, there is a root of trust (Bazel itself, Google's own Bazel rules) and BCR renders any such manual verification moot in any case.

they might let Google know that they've been modified

By trusting Google, and in turn adopting their hash for a given version of Protobuf, it will thus notify you, too, in the event a release artifact changes unexpectedly.

@justfalter
Copy link

justfalter commented Jan 10, 2025

@sgammon

Can you please share a link to these WORKSPACE files? I'm having a hard time finding them.

Here is one; there are many.

No, there really aren't. There are a grand total for 10 bazel files in the Google org that are pulling from protocolbuffers/protobuf/releases, and only one file is actually pulling in binary assets AND checking the hash... and that's for two files out of the twenty-six in a two-year old release.

There is nothing convenient, canonical, nor trustworthy in the approach that you are suggesting, @sgammon. Not everyone uses Bazel, so BCR is irrelevant.


There was past interest in GPG-signing (#6232), but a bot auto-closed the ticket. Hopefully a maintainer will actually pull this together.

@sgammon
Copy link
Contributor

sgammon commented Jan 10, 2025

No, there really aren't.

"Many" means more than one; you just admitted there are many. I personally have seen several that pin at the hash, but at this point I don't think I can help you.

two files out of the twenty-six in a two-year old release

No idea what you are talking about since the hash of course covers the entire release of Protobuf... the age of the release is irrelevant.

There is nothing convenient, canonical, nor trustworthy in the approach that you are suggesting

The solution I offered doesn't purport to be any of these things. It is a workaround if you have no other better way to checksum (Maven, BCR). I don't work for Google and it isn't my job to make your life more convenient. It is trustworthy and I have explained why.

Not everyone uses Bazel, so BCR is irrelevant.

You use Bazel. I am trying to help you and others who come to this repo who use Bazel. If you aren't using Bazel, then you should probably ask the build tool or script you use to checksum. Or you should just find the checksum (as I've done for you here) and then use it yourself.

If you don't trust the checksum I find for you, or that you find, that is a different story, one which is made slightly better by the presence of checksums in Google's own sources.

There was past interest in GPG-signing (#6232), but a bot auto-closed the ticket. Hopefully a maintainer will actually pull this together.

PRs are accepted!

@sgammon
Copy link
Contributor

sgammon commented Jan 10, 2025

@alexeagle Considering BCR, and other release channels (Maven et al) which do checksum, maybe this can be closed

@justfalter
Copy link

For anyone who comes across this issue thread, all I want is a way to validate the integrity of assets attached to protobuf releases. The OP made it fairly clear what was wanted.

@sgammon
Copy link
Contributor

sgammon commented Jan 10, 2025

Sure, for anyone coming across this thread (?) nobody is against publishing checksums, and there are workarounds posted above which are reasonable. If you can, upgrade to BCR, where such checks are automatic.

@alexeagle
Copy link
Contributor Author

I think we've gone down some unnecessary paths on this issue.

I don't think anyone is proposing that protoc binaries would be published to the Bazel Central Registry, so I don't think that's a helpful resolution. That registry should be for Bazel modules, which are written in Starlark, and allow users to build with Bazel. It should only be used for the from-source distribution, which works today.

This issue isn't even Bazel-specific. There are many examples of binary distributions that publish checksums along with the release.

@epicseven-cup
Copy link

I agree, main reason for this follow up from my comment before was trying to get a checksum somewhere for the protoc binaries so that you can check the ones you download from GitHub release.

I saw that we a few different places we can reference the checksums of the release (Maven etc).

I want to see if we can add a link on the README about the checksum on Maven? At least users know that it exists when navigating to GitHub grabbing the binary.

I want to add in a niche use case for the checksum, you can also use it to sure that the downloaded binary was not corrupted.

I can make a PR if needed.

@wiiznokes
Copy link

I came across this issue because because in need protoc in a flatpak app, and a sha is required. I don't know bazel, should i just calculate it on my pc ? Thanks

@sgammon
Copy link
Contributor

sgammon commented Jan 11, 2025

@alexeagle > I don't think anyone is proposing that protoc binaries would be published to the Bazel Central Registry

The commenter mentioned he uses Bazel. I suggested that protobuf is, in fact, listed within BCR. The source for the release is checksummed:

{
    "integrity": "sha256-6bmsGRCxBBBlg5hQYDyvNuKdPT0jDd9SvRN3jdMbkEY=",
    "strip_prefix": "protobuf-29.3",
    "url": "https://github.com/protocolbuffers/protobuf/releases/download/v29.3/protobuf-29.3.zip"
}

So, that is one way to obtain a Google-trusted checksum for a Protobuf source release. Unless I am missing something? In any case, this is merely a workaround until such checksums can be published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants