Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: harden harden-runner egress policy #477

Merged
merged 3 commits into from
Sep 5, 2024

Conversation

tallaxes
Copy link
Collaborator

@tallaxes tallaxes commented Sep 5, 2024

Description

Switch to Harden Runner egress-policy: block, add selected allowed-endpoints. Also disable-telemetry: true for now.

The endpoints come from StepSecurity insights for corresponding runs (such as this one; "Recommendations" tab as base, "Networks Events" for further drill down), with some selected wildcarding (to accommodate variations) and removals (mostly things that are no longer used, but might still be reported in the cumulative insights). Notes and outliers:

  • The largest set of endpoints is (naturally) used by E2E tests, which use az CLI, create AKS clusters, and run tests. The only one using wildcards.
  • Historically observed but removed: 9236a389bd48b984df91adc1bc924620.r2.cloudflarestorage.com - blob storage for cgr.dev, and we no longer use Chainguard for base images
  • clients3.google.com:80 and firebaselogging-pa.googleapis.com:443 - confirmed Skaffold, likely telemetry, likely can be avoided/blocked with/after skaffold config set --global collect-metrics false
  • There are no changes to "Build and publish to MCR" workflow, as it runs on self-hosted 1ES pool. (Use of Harden Runner in that context needs to be reviewed separately.)

How was this change tested?

  • Selected checks (using the updated workflows) passed

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note


@tallaxes tallaxes added area/security Issues or PRs related to security area/e2e-testing Issues or PRs related to e2e testing area/ci Issues or PRs related to ci labels Sep 5, 2024
@tallaxes tallaxes self-assigned this Sep 5, 2024
@tallaxes tallaxes marked this pull request as ready for review September 5, 2024 01:33
@coveralls
Copy link

coveralls commented Sep 5, 2024

Pull Request Test Coverage Report for Build 10729088440

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 97.702%

Totals Coverage Status
Change from base Build 10709390831: 0.0%
Covered Lines: 36313
Relevant Lines: 37167

💛 - Coveralls

Copy link
Collaborator Author

@tallaxes tallaxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@charliedmcb
Copy link
Collaborator

Copy link
Collaborator

@charliedmcb charliedmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks pretty good. Mostly nits.

However, do have a bit larger concern for dev flow impact. If we update the workflows, and need to add new allowed-endpoints, will there be an easy way to determine what those endpoints we need to add are?

Also, need to confirm E2E tests pass.

.github/workflows/e2e-matrix.yaml Show resolved Hide resolved
.github/workflows/e2e.yaml Show resolved Hide resolved
.github/workflows/e2e.yaml Show resolved Hide resolved
.github/workflows/e2e.yaml Show resolved Hide resolved
@tallaxes
Copy link
Collaborator Author

tallaxes commented Sep 5, 2024

However, do have a bit larger concern for dev flow impact. If we update the workflows, and need to add new allowed-endpoints, will there be an easy way to determine what those endpoints we need to add are?

Yes, likely using either the failure itself or Harden Runner post-action log (or report in the action run summary). If this fails - setting disable-telemetry: false temporarily, do another run, and use StepSecurity report. (We are also likely to switch telemetry back on permanently after some further internal review.)

charliedmcb
charliedmcb previously approved these changes Sep 5, 2024
Copy link
Collaborator

@charliedmcb charliedmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. 1 nit

@charliedmcb
Copy link
Collaborator

However, do have a bit larger concern for dev flow impact. If we update the workflows, and need to add new allowed-endpoints, will there be an easy way to determine what those endpoints we need to add are?

Yes, likely using either the failure itself or Harden Runner post-action log (or report in the action run summary). If this fails - setting disable-telemetry: false temporarily, do another run, and use StepSecurity report. (We are also likely to switch telemetry back on permanently after some further internal review.)

Ok, sounds good.

Copy link
Collaborator

@charliedmcb charliedmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tallaxes tallaxes merged commit 501bd69 into main Sep 5, 2024
12 checks passed
@tallaxes tallaxes deleted the tallaxes/harden-harden-runner branch September 5, 2024 22:34
Bryce-Soghigian pushed a commit that referenced this pull request Sep 12, 2024
* fix: harden harden-runner egress policy

* doc: add comments

* doc: fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci Issues or PRs related to ci area/e2e-testing Issues or PRs related to e2e testing area/security Issues or PRs related to security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants