diff --git a/text/0011-gem-checksum-verification.md b/text/0011-gem-checksum-verification.md index 45c0726..61ee3ef 100755 --- a/text/0011-gem-checksum-verification.md +++ b/text/0011-gem-checksum-verification.md @@ -9,19 +9,20 @@ Bundler is adding checksum verification of gems when they are installed. It shou # Motivation -Verifying gem checksums with known checksums at install time is a stronger way to verify that the exact same gem source is being used in every environment where the bundle is installed. Using checksums, especially those sourced from rubygems.org, is a more strict way of locking to a specific gem version. The feature should work transparently in much the same way that bundler already locks to specific versions. +Verifying gem checksums with known checksums at install time is a stronger way to verify that the exact same gem source is being used in every environment where the bundle is installed. +Using checksums, sourced from rubygems.org or from digests of local .gem file, is a more strict way of locking to a specific gem version. +The feature should work transparently in much the same way that bundler already locks to specific versions by ensuring that not just the same version, but the exact same rubygem file data, is installed on each environment. # Guide-level explanation -Upon first upgrading to the version of Bundler with this feature, the checksum feature will be enabled. +Upon first upgrading to this version of Bundler, expect bundler to continue to be secure by default and to progressively add checksums to the lockfile. -First usage will fall into two use cases: -1. Users that want to immediately use the new checksums feature. -2. Users that are not specifically using the checksums feature, but expect bundler to work securely without interference. +If you wish to immediately add checksums to your lockfile for all locked gems, run `bundle lock`. +Bundle lock now fetches checksums from remote sources by default. +If you would like to bypass this behavior, add the `--no-checksums` flag. -For users excited about the new feature, the usual commands and docs should show the new options for securing the bundle with checksums. Documentation and the release notice should include instructions for directly using the new feature. - -For users that are not specifically looking for the feature, common commands like `bundle install`, `bundle update`, `bundle lock`, `bundle add`, should automatically make use of checksums when available. When there are missing checksums that must be fetched with a specific interaction, the UI should say what that action is. +Common Bundler commands like `bundle install`, `bundle update`, `bundle lock`, `bundle add` will now automatically record and verify checksums. +Commands that rely on checksums for verification will output a message or warning when checksums are missing or mismatched. Example: @@ -29,13 +30,16 @@ Example: $ bundle install Bundle complete! 88 Gemfile dependencies, 256 gems now installed. Use `bundle info [gemname]` to see where a bundled gem is installed. -256 gems are missing checksums. Source integrity could not be verified. -Use `bundle lock` to add checksums to Gemfile.lock. +Use `bundle lock` to add missing checksums to Gemfile.lock. ``` -When a gem version is updated in the lockfile using a source that has checksums (like rubygems.org), the checksum should be recorded. Running `bundle update` or `bundle add` should record the checksum from the source into the Gemfile.lock. If a checksum is not available from the source because the source does not provide such info, then a checksum should be created during install using the .gem file or a warning should be printed. +Running `bundle update` or `bundle add` will record the checksum from the source (e.g. rubygems.org) into the Gemfile.lock, if it is available. +If a checksum is not available from the source because the source does not provide such info (e.g. private gemservers) then a checksum will be created during install using the .gem file. +If a checksum can't be created because the source is a path or git source, then only the gems name and version will be recorded with a blank checksum. -If a user wishes to ensure that the environment they are using contains only gems installed from verified sources that match the checksums in the Gemfile.lock, the command `bundle pristine` should be run. The `bundle pristine` command installs every gem fresh from the .gem source, but does not fetch gems from the server or read the source index. Therefore, if a users wants all gems to be verified upon install, `bundle pristine` should be run after checksums are written to the Gemfile.lock. +If you want to ensure that the bundled environment contains only gems matching the checksums in the lockfile, run `bundle pristine`. +The `bundle pristine` command installs every gem fresh from a newly downloaded .gem source. +The pristine install will trigger computation and comparison of the generated SHA256 checksum with the checksum stored in the lockfile. Example: @@ -44,82 +48,72 @@ $ bundle pristine Installing rake 13.0.6 … Installing rspec 3.12.0 -42 gems are missing checksums. Source integrity could not be verified. +42 gems without checksums. Use `bundle lock` to add checksums to Gemfile.lock. -$ bundle lock --update +$ bundle lock Writing lockfile to path/to/Gemfile.lock $ bundle pristine -Installing rake 13.0.6 (verified) +Installing rake 13.0.6 … -Installing rspec 3.12.0 (verified) +Installing rspec 3.12.0 $ ``` -If a gem that is about to be installed does not match the checksum, an error is generated. +When bundler installs a gem from a `.gem` file, it computes the SHA256 checksum of the file. +If an existing checksum is available from the lockfile or the remote source, it will be compared with the computed checksum at install time. +If the checksums do not match, an error is generated and installation is halted. +If no checksum is recorded in the lockfile, the computed checksum is saved to the lockfile where future installs can verify that the same gem is installed. Example: ``` $ bundle install Installing rake 13.0.6 -Bundler cannot continue installing rake-13.0.6. -The checksum for the downloaded `rake-13.0.6.gem` does not match the known checksum for the gem. - -Expected: sha256-814828c34f1315d7e7b7e8295184577cc4e969bad6156ac069d02d63f58d82e8 from: Gemfile.lock:21:1 CHECKSUMS rake (13.0.6) -Downloaded: sha256-69b69b69b69b69b69b69b69b69b69b69b69b69b69b69b69b69b69b69b69b69b6 from: -path/to/rake-13.0.6.gem SHA256 hexdigest - -To resolve this issue: -Delete the downloaded gem files referenced above and run `bundle install` - -If you are sure that the downloaded gem is correct, you can run… -(RFC TODO: Currently undecided on what the command should be here to allow unverified gems.) -`bundle install --skip-verification` OR -`bundle config set unverified ignore` OR -Remove the offending line from CHECKSUMS in Gemfile.lock +Bundler found mismatched checksums. This is a potential security risk. + rake (13.0.6) sha256-2222222222222222222222222222222222222222222222222222222222222222 + form the lockfile CHECKSUMS at Gemfile.lock:21:17 + rake (13.0.6) sha256-814828c34f1315d7e7b7e8295184577cc4e969bad6156ac069d02d63f58d82e8 + from the gem at path/to/rake-13.0.6.gem + + To resolve this issue you can either: + 1. remove the gem at path/to/rake-13.0.6.gem + 2. run `bundle install` + or if you are sure that the new checksum from the lockfile CHECKSUMS at Gemfile.lock:21:17 is correct: + 1. remove the matching checksum in Gemfile.lock:21:17 + 2. run `bundle install` + +To ignore checksum security warnings, disable checksum validation with + `bundle config set --local disable_checksum_validation true` ``` -Certain checksums will always be unavailable because the source does not provide a checksum. When a checksum is not available, a line can be added to the Gemfile.lock CHECKSUMS section that explicitly indicates that no checksum is recorded for that gem version. Future installs of gems with intentionally ignored checksums should not error or warn when the gem is installed. The **(RFC TODO)** command will allow idicating that a gem version should ignore checksum verifications. Alternatively, the SHA256 digest of the gem at install time can be recorded allowing the user to verify future installs do not deviate from the recorded checksum. - -Platforms may pose a problem for Gemfile.lock checksums. If a user has not added one of the platforms where they bundle their app, then checksum verification could be skipped. In order to avoid this security vulnerability, the following should happen automatically. - -1. When a gem is installed and no checksum is available in the lockfile, a warning should be printed indicating that unverified gems were installed. -2. When a gem is installed where a checksum is available for the same gem on a different platform, an error should be raised. (i.e. if nokogiri for darwin has a checksum, but nokogiri for musl does not, then an error should be raised). -3. When bundle install is configured with frozen, an error should be raised for any gem installed that does not have a checksum unless the checksum is explicitly ignored. - -When platform or missing checksum issues are raised, the error should always indicate what command or process should be performed to fix the error. - -Bundling the app on another platform, in CI for example, should highlight any platform issues. This will make the bundler upgrade easier for users. - -To encourage uptake of this security feature, it should be enabled by default. Any actions that fetch authoritative checksums should silently record them in the Gemfile.lock. Progressive upgrading to verified checksum behavior should happen naturally unless explicitly disabled. +Certain checksums will always be unavailable because the source does not provide a checksum. +When a checksum is not available, the gem will be added to the Gemfile.lock CHECKSUMS section without a checksum. -Gem upgrades in the Gemfile.lock happen in two ways currently. Each way will behave differently. - -1. The user explicitly runs `bundle update`. -2. Dependabot, or another automation, changes the version of a locked gem directly in the Gemfile.lock (assuming current Dependabot behavior without a corresponding upgrade to match this feature). - -Explicit gem updates will record the new checksum for the gem and remove any old checksums for gems that are no longer in the Gemfile.lock. - -Direct updates to the Gemfile.lock will print a warning during install. The warning should say that X number of gem(s) could not be verified. -If the bundle is frozen, then the warning should instead print an error and halt gem installation. +Users should be aware that when bundling on CI or production, new gems can be added for a platform not found in the Gemfile.lock. +Bundler will silently record the new checksums for the missing gem just like on a local development machine. +If you would like to ensure that only lockfile checksums are used, bundle install should use use the frozen or deployment configurations. # Reference-level explanation -Gem checksums are fetched from rubygems.org as part of the compact index. The checksums are stored in memory during the bundle process. +Gem checksums are fetched from rubygems.org as part of the compact index. +The checksums are stored in memory during the bundle process. If any gems are recorded from a separate source or installed via .gem file, another checksum is recorded and compared with the original. -If any of the checksums for the same gem name, version and platform differ, an error is raised. +If any of the checksums for the same gem name, version and platform differ, an error is raised with instructions for resolving the error. When the Gemfile.lock is written, a new CHECKSUMS section is written with all the gems in the bundle and their corresponding checksums. -Future compatibility for new checksum algorithms is supported by reading and writing existing checksums for installed gems even if the algorithm is unknown. Checksum comparisons take into account the algorithm used and raises errors accordingly. +Future compatibility for new checksum algorithms is supported by reading and writing existing checksums for installed gems even if the algorithm is unknown. +Checksum comparisons take into account the algorithm used and raises errors accordingly. Bundler commands that interact with checksums either fetch checksums from sources and update the Gemfile.lock or compare checksums in the Gemfile.lock with the computed digest of the gem being installed. -Internal storage of checksums indexes the checksums by a gem's full name (name-version-platform) and the checksum's algorithm. The source of each checksum is stored with the checksum so that errors can detail where conflicting checksums came from. +Internal storage of checksums indexes the checksums by a gem's NameTuple (name, version, platform) and the checksum's algorithm. +The source of each checksum is stored with the checksum so that errors can describe how to fix conflicting checksums. -When two sources have a checksum for the same gem full name, the are compared. If they are the same, they are merged into a single record that associates the checksum with all of the sources for that checksum. +When two remote sources have a checksum for the same gem, the are compared. +If they are the same, bundler proceeds as normal ### Example CHECKSUM section @@ -150,35 +144,74 @@ During the `rails new` command, `bundle install` pulled all the checksums from t ### Excessive failures -If checksum verification failures happen more often than expected, it could cause the feature to be ignored or derided as poorly designed and implemented. Bundler should progressively update the Gemfile.lock transparently without too much interaction or excessive warnings and failures. Since checksums verification is just a more strict approach to version verification, this feature fits with the existing expectations of Bundler’s features. This means that checksum verification should be unobtrusive. The feature is not a big deal that needs lots of messaging, and should not fail unless bundler is configured to be strict (in the case of missing checksums) or there is an actual verification failure (in the case of corrupt or malicious gems) +If checksum verification failures happen more often than expected, it could cause the feature to be ignored or derided as poorly designed and implemented. +Bundler should progressively update the Gemfile.lock transparently without too much interaction or excessive warnings and failures. +Checksums verification is just a more strict approach to version verification, so this feature fits with the existing expectations of Bundler’s features and should be unobtrusive. +The feature is not a "big deal" that needs lots of warnings or errors to encourage usage. +It should not fail unless bundler is configured to be strict (a frozen bundle) or there is an actual verification failure (a corrupt or malicious gem). ### Increased installation time -Gems will be SHA256 digested during install, adding to the install time. This could be a significant burden for some bundles. Future proposals such as content addressable storage for gem data could address this slowdown, but this is not immediately realistic. +Gems will be SHA256 digested during install, slightly adding to the install time. +This could be a larger burden for some bundles on slower machines. + +Future proposals could attempt to address this slowdown, if necessary. ### Unverifiable gems -When gems come from private gem servers that do not implement the compact index, checksums will not be available. Bundler must provide configuration that can automatically store digests for gems from sources that don’t supply checksums so that the feature is not a burden to users relying on these sources. An option could be created to configure a specified source to always compute checksums for updated versions. This would mean any change in version from the source would be trusted at the time, but any locked version that is installed later would be verified to match the first install. +When gems come from private gem servers that do not implement the compact index, checksums will not be available. +Bundler will calculate digests from .gem files from sources that don’t supply checksums. + +An additional flag for `bundle lock` could be provided that allows reinstalling, and thus calculating the checksum for, gems that don't have a checksum. + +Gems from a git source are verified by nature of matching a specific git SHA and should be excluded from checksum verification. -### GitHub Dependabot version upgrades are at risk of failing. +Path source gems cannot be verified because the checksum of the entire path would be complicated to calculate and unreliable. -If a user configures CI to use locked gem versions, it would be necessary for Dependabot to write the corresponding checksum line to the Gemfile to prevent every CI build from failing. This could cause many users to disable the feature because it interferes with an existing workflow or causes excessive CI failure notices. This can be addressed by making clear messaging about how to fix this problem. +### GitHub Dependabot and other automations are at risk of failing. + +If a user configures CI to install with frozen bundle, then all dependabot pull requests will fail. +This will make dependabot update branches much less useful, since a failing update is almost guaranteed. +In order to remain functional, Dependabot would need to write the corresponding checksum lines to the Gemfile to prevent every CI build from failing. +This would make dependabot the defacto source of the checksums in the Gemfile instead of rubygems.org or .gem files. +This might lead users to disable frozen bundles, disable dependabot, or disable the checksums feature, any of which could be considered major drawbacks. + +This can be addressed by making clear messaging about how to fix this problem (checkout the branch and run bundle install) +Additionally, we should work with the dependabot team to provide the information necessary to add checksums to the lockfile. ### Older versions of Bundler -Old versions of Bundler should ignore the CHECKSUMS section. We will need to check older versions to be sure. +Old versions of Bundler should ignore the CHECKSUMS section. # Rationale and Alternatives -- Rubygems.org already stores SHA256 checksums for gems and returns them in the compact index response. All the information is already present for checksum verification on the client side. -- Verifying .gem files at install time offers nearly complete protection against hacked, altered and corrupted gems. Bundler already ensures that only the bundled gems are available to the app. This feature adds the assurance that only verified gems were installed with the bundle. -- One alternative to this solution is including (vendoring) the bundled gems in the repository. This effectively has the same result, since the gems that will be installed during the production deploy will be verifiably the same gems that were used during CI and development. The downside of this vendored approach is the increase in the repository size. Checksums allow for a similar level of confidence without the downsides of vendoring gems and can be enabled by default so that more users will benefit. +Rubygems.org already stores SHA256 checksums for gems and returns them in the compact index response. +All the information is already present for checksum verification on the client side. + +Verifying .gem files at install time offers nearly complete protection against hacked, altered and corrupted gems. +Bundler already ensures that only the bundled gems are available to the app. +This feature adds the assurance that only verified gems were installed with the bundle. + +One alternative to this solution is including (vendoring) the bundled gems in the repository. +This effectively has the same result, since the gems that will be installed during the production deploy will be verifiably the same gems that were used during CI and development. +The downside of this vendored approach is the increase in the repository size. +The upside is installation that doesn't depend on a remote source. +Checksums allow for a similar level of confidence without the larger repository size of vendoring gems. +It can be enabled by default so that more users will benefit. # Unresolved questions -- What are the exact commands, options, warning and error messages that are used to interact with the feature? -- What are the unexpected errors that may happen the first time people verify the gems they are downloading? Are there any errors or problems in the existing rubygems.org repository? +### What are the unexpected errors that may happen the first time developers interact with this feature? + +In particular, the first deploy for users with a frozen bundle in CI and/or production may produce errors that might not have obvious solutions to someone unfamiliar with the feature. ### How do we handle confusion about the authority of checksums written to the Gemfile.lock -The source of checksums in the Gemfile.lock becomes a matter of trust once it's written. Did the checksum come from the API or was it calculated from a .gem file on a developers computer. If a checksum error is resolved by one developer in a way that saves an incorrect checksum, how should people know when to approve these changes or not. It may not even be common practice for most teams to look at the Gemfile.lock, and changes can often be hidden in pull request reviews. Without a process for checking that the checksums are trustworthy, it's left to every development team to decide on a process. One solution would be a bundle command that could be run in CI every time the gems are installed that verifies the authenticity of checksums in the Gemfile.lock. +The source of checksums in the Gemfile.lock becomes a matter of trust once it's written. +Did the checksum come from the API or was it calculated from a .gem file on a developers computer. +If a checksum error is resolved by one developer in a way that saves an incorrect checksum, how should people know when to approve these changes or not. +It may not be common practice for most teams to look at the Gemfile.lock when reviewing code. +Gemfile.lock changes can be hidden in pull request reviews. +Without a process for checking that the checksums are trustworthy, it's left to every development team to decide on a process. + +One solution would be a bundle command that could be run in CI every time the gems are installed that verifies the authenticity of checksums in the Gemfile.lock.