Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: switch to new yardstick validate #672

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/quality-gate/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ runs:
- name: Validate provider results
shell: bash
working-directory: tests/quality
run: poetry run make validate
run: poetry run make validate provider=${{ inputs.provider }}

- name: Archive the provider state (${{ inputs.provider }})
if: ${{ failure() }}
Expand Down
585 changes: 283 additions & 302 deletions poetry.lock

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ mypy = "^1.1"
radon = ">=5.1,<7.0"
dunamai = "^1.15.0"
ruff = ">=0.5.1,<0.5.7"
yardstick = {git = "https://github.com/anchore/yardstick", rev = "v0.9.2"}
yardstick = {git = "https://github.com/anchore/yardstick", rev = "fe6ae0f3a4399aeae08abc60e98670f6764614c9"}
# yardstick = {path = "../yardstick", develop=true }
tabulate = "0.9.0"
tox = "^4.11.3"

Expand Down
9 changes: 5 additions & 4 deletions tests/quality/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ all: capture validate ## Fetch or capture all data and run all quality checks

.PHONY: validate
validate: ## Run all quality checks against already collected data
poetry run ./gate.py
poetry run ./validate-namespaces.py
poetry run yardstick validate --result-set $(RESULT_SET)_$(provider)


## Data management targets #################################
Expand Down Expand Up @@ -61,16 +62,16 @@ build-db: ## Build a grype database for the given provider

.PHONY: vulns
vulns: ## Collect and store all grype results
poetry run yardstick -v result capture -r $(RESULT_SET)
poetry run yardstick -v result capture -r $(RESULT_SET)_$(provider)

.PHONY: sboms
sboms: $(YARDSTICK_RESULT_DIR) clear-results ## Collect and store all syft results (deletes all existing results)
bash -c "make download-sboms || (yardstick -v result capture -r $(RESULT_SET) --only-producers)"
bash -c "make download-sboms || (yardstick -v result capture -r $(RESULT_SET)_$(provider) --only-producers)"

.PHONY: download-sboms
download-sboms:
cd vulnerability-match-labels && make venv
bash -c "export ORAS_CACHE=$(shell pwd)/.oras-cache && . vulnerability-match-labels/venv/bin/activate && ./vulnerability-match-labels/sboms.py download -r $(RESULT_SET)"
bash -c "export ORAS_CACHE=$(shell pwd)/.oras-cache && . vulnerability-match-labels/venv/bin/activate && ./vulnerability-match-labels/sboms.py download -r $(RESULT_SET)_$(provider)"

$(YARDSTICK_RESULT_DIR):
mkdir -p $(YARDSTICK_RESULT_DIR)
Expand Down
21 changes: 14 additions & 7 deletions tests/quality/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ While developing it may be useful to only run one provider for rapid troubleshoo

```
make capture provider=github
make validate
make validate provider=github
```

## What is the quality gate criteria
Expand All @@ -51,6 +51,8 @@ specifically with the following criteria:
release
- otherwise, pass

These criteria are configured per provider in `tests/quality/config.yaml`.

F1 score is the primary way that tool matching performance is characterized. F1
score combines the TP, FP, and FN counts into a single metric between 0 and 1.
Ideally the F1 score for an image-tool pair should be 1. F1 score is a good way
Expand Down Expand Up @@ -113,7 +115,7 @@ To reduce the eroding value over time we've decided to change as many moving
targets into fixed targets as possible:

- Vulnerability results beyond a particular year are ignored (the current config
allows for <= 2020). Though there are still retroactive CVEs created, this
allows for <= 2021). Though there are still retroactive CVEs created, this
helps a lot in terms of keeping vulnerability results relatively stable.

- SBOMs are used as input into grype instead of the raw container images. This
Expand Down Expand Up @@ -144,14 +146,18 @@ to keep in mind:
assets that are no longer useful for comparison, but this should rarely be
done.

- Consider not changing the CVE year max-ceiling (currently set to 2020).
- Consider not changing the CVE year max-ceiling (currently set to 2021).
Pushing this ceiling will likely raise the number of unlabled matches
significantly for all images. Only bump this ceiling if all possible matches
are labeled.

- If the CVE year max-ceiling needs to be pushed, try to push it only for one
provider. That is, edit the max-year value on the validation for that
provider in `tests/quality/config.yaml`.

## Workflow

One way of working is to simply run `yardstick` and `gate.py` in the `test/quality` directory.
One way of working is to simply run `yardstick` in the `test/quality` directory.
You will need to make sure the `vulnerabilty-match-labels` submodule has been initialized. This happens automatically
for some `make` commands, but you can ensure this by `git submodule update --init`. After the submodule has been
initialized, the match data from `vulnerabilty-match-labels` will be available locally.
Expand All @@ -174,7 +180,7 @@ After `make capture` has finished, we should have results and can now start insp
modifying the comparison labels.

To get started, let's assume we see some quality gate failure in like this (something found in CI
or after running `./gate.py`):
or after running `yardstick validate --result-set pr_vs_latest_via_sbom`):
```
Running comparison against labels...
Results used:
Expand Down Expand Up @@ -218,7 +224,7 @@ At this point you can run the quality gate using updated label data. The quality
just one image, for example the image we first found in the failure, so run the quality gate and see
how changes to the label data have affected the result:
```shell
./gate.py --image docker.io/anchore/test_images@sha256:808f6cf3cf4473eb39ff9bb47ead639d2ed71255b75b9b140162b58c6102bcc9
yardstick validate -r pr_vs_latest_via_sbom --image docker.io/anchore/test_images@sha256:808f6cf3cf4473eb39ff9bb47ead639d2ed71255b75b9b140162b58c6102bcc9
```

After iterating on all the changes we need using `yardstick label explore`, we're now ready to commit changes. Since
Expand Down Expand Up @@ -307,7 +313,8 @@ like this:
(venv) user@HOST quality %
```

Now you should be able to run both `yardstick` and `./gate.py`.
Now you should be able to run both `yardstick` to see and update labels and
`make validate provider=<some provider` to validate the results.

## Troubleshooting

Expand Down
20 changes: 20 additions & 0 deletions tests/quality/config.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
x-ref:
validations: &standard-validations
- max-f1-regression: 0.0
max-new-false-negatives: 00
max-unlabeled-percent: 10
max_year: 2021
candidate_tool_label: custom-db
yardstick:
default_max_year: 2021

Expand Down Expand Up @@ -31,6 +38,7 @@ yardstick:
# - this version should ALWAYS match that of the other "grype" tool above
version: latest
takes: SBOM
label: reference

grype_db:
# values:
Expand Down Expand Up @@ -71,8 +79,11 @@ tests:
- alpine:distro:alpine:3.19
- alpine:distro:alpine:3.20
- alpine:distro:alpine:edge
- nvd:cpe # alpine lists fixes to NVD entries, so NVD entries are also expected
validations: *standard-validations

- provider: amazon
validations: *standard-validations
images:
- docker.io/amazonlinux:2@sha256:1301cc9f889f21dc45733df9e58034ac1c318202b4b0f0a08d88b3fdc03004de
- docker.io/anchore/test_images:vulnerabilities-amazonlinux-2-5c26ce9@sha256:cf742eca189b02902a0a7926ac3fbb423e799937bf4358b0d2acc6cc36ab82aa
Expand All @@ -92,6 +103,7 @@ tests:
- ghcr.io/chainguard-images/scanner-test:latest@sha256:59bddc101fba0c45d5c093575c6bc5bfee7f0e46ff127e6bb4e5acaaafb525f9
expected_namespaces:
- chainguard:distro:chainguard:rolling
validations: *standard-validations

- provider: debian
# ideally we would not use cache, however, the in order to test if we are properly keeping the processing
Expand Down Expand Up @@ -144,19 +156,22 @@ tests:
- github:language:ruby
- github:language:rust
- github:language:swift
validations: *standard-validations

- provider: mariner
images:
- mcr.microsoft.com/cbl-mariner/base/core:2.0.20220731-amd64@sha256:3c0f7e103ff3c39e81e7c9c042d2b321d833fb6d26d8636567f7d88a6bdde74a
expected_namespaces:
- mariner:distro:mariner:1.0
- mariner:distro:mariner:2.0
validations: *standard-validations

- provider: nvd
images:
- docker.io/busybox:1.28.1@sha256:2107a35b58593c58ec5f4e8f2c4a70d195321078aebfadfbfb223a2ff4a4ed21
expected_namespaces:
- nvd:cpe
validations: *standard-validations

- provider: oracle
additional_trigger_globs:
Expand All @@ -170,6 +185,7 @@ tests:
- oracle:distro:oraclelinux:7
- oracle:distro:oraclelinux:8
- oracle:distro:oraclelinux:9
validations: *standard-validations

- provider: rhel
# ideally we would not use cache, however, the ubuntu provider is currently very expensive to run.
Expand All @@ -185,6 +201,7 @@ tests:
- docker.io/anchore/test_images:appstreams-centos-stream-8-1a287dd@sha256:808f6cf3cf4473eb39ff9bb47ead639d2ed71255b75b9b140162b58c6102bcc9
- docker.io/anchore/test_images:appstreams-rhel-8-1a287dd@sha256:524ff8a75f21fd886ec7ed82387766df386671e8b77e898d05786118d5b7880b
- docker.io/anchore/test_images:vulnerabilities-centos@sha256:746d31247006cc06434ce91ccf3523b2c230ff6c378ffed7ca1c60bbb48ea86f
validations: *standard-validations

expected_namespaces:
- redhat:distro:redhat:5
Expand Down Expand Up @@ -220,6 +237,7 @@ tests:
- sles:distro:sles:15.4
- sles:distro:sles:15.5
- sles:distro:sles:15.6
validations: *standard-validations

- provider: ubuntu
# ideally we would not use cache, however, the ubuntu provider is currently very expensive to run.
Expand Down Expand Up @@ -256,6 +274,7 @@ tests:
- ubuntu:distro:ubuntu:23.04
- ubuntu:distro:ubuntu:23.10
- ubuntu:distro:ubuntu:24.04
validations: *standard-validations

- provider: wolfi
additional_providers:
Expand All @@ -265,3 +284,4 @@ tests:
- cgr.dev/chainguard/wolfi-base:latest-20221001@sha256:be3834598c3c4b76ace6a866edcbbe1fa18086f9ee238b57769e4d230cd7d507
expected_namespaces:
- wolfi:distro:wolfi:rolling
validations: *standard-validations
36 changes: 25 additions & 11 deletions tests/quality/configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
ResultSet,
ScanMatrix,
Tool,
Validation,
)
from yardstick.cli.config import Application as YardstickApplication

Expand Down Expand Up @@ -58,6 +59,7 @@ class Test:
provider: str
use_cache: bool = False
images: list[str] = field(default_factory=list)
validations: list[Validation] = field(default_factory=list)
additional_providers: list[AdditionalProvider] = field(default_factory=list)
additional_trigger_globs: list[str] = field(default_factory=list)
expected_namespaces: list[str] = field(default_factory=list)
Expand Down Expand Up @@ -100,20 +102,30 @@ def load(cls, path: str = "") -> "Config":
return cfg

def yardstick_application_config(self, test_configurations: list[Test]) -> Application:
# tests is the set of providers explicitly requested
# each provider is associated with the set of images it needs to scan
# and the set of validations it needs to perform.
images = []
for test in test_configurations:
images += test.images
for validation in test.validations:
if test.expected_namespaces:
validation.allowed_namespaces = test.expected_namespaces

def result_set_from_test(t: Test) -> ResultSet:
return ResultSet(
description=f"latest vulnerability data vs current vunnel data with latest grype tooling (via SBOM ingestion) for {test.provider}",
validations=test.validations,
matrix=ScanMatrix(
images=t.images,
tools=self.yardstick.tools,
),
)

result_sets = {f"pr_vs_latest_via_sbom_{test.provider}": result_set_from_test(test) for test in test_configurations}
return Application(
default_max_year=self.yardstick.default_max_year,
result_sets={
"pr_vs_latest_via_sbom": ResultSet(
description="latest vulnerability data vs current vunnel data with latest grype tooling (via SBOM ingestion)",
matrix=ScanMatrix(
images=images,
tools=self.yardstick.tools,
),
),
},
result_sets=result_sets,
)

def test_configuration_by_provider(self, provider: str) -> Test | None:
Expand Down Expand Up @@ -284,6 +296,7 @@ def write_yardstick_config(cfg: Application, path: str = ".yardstick.yaml"):


def write_grype_db_config(providers: set[str], path: str = ".grype-db.yaml"):
logging.info(f"writing grype-db config to {path!r}")
with open(path, "w") as f:
f.write(
"""
Expand Down Expand Up @@ -462,6 +475,7 @@ def configure(cfg: Config, provider_names: list[str]):

providers = set(cached_providers + uncached_providers)

logging.info(f"writing grype-db config for {' '.join(providers)}")
write_grype_db_config(providers)
write_yardstick_config(yardstick_app_cfg)

Expand Down Expand Up @@ -601,8 +615,8 @@ def build_db(cfg: Config):
subprocess.run(["vunnel", "-v", "run", provider], check=True)

logging.info("building DB")
subprocess.run([GRYPE_DB, "build", "-v"], check=True)
subprocess.run([GRYPE_DB, "package", "-v"], check=True)
subprocess.run([GRYPE_DB, "build", "-v", "-c", ".grype-db.yaml"], check=True)
subprocess.run([GRYPE_DB, "package", "-v", "-c", ".grype-db.yaml"], check=True)

archives = glob.glob(f"{build_dir}/*.tar.gz")
if not archives:
Expand Down
Loading
Loading