diff --git a/README.md b/README.md index 071d25fdc..8e2178906 100644 --- a/README.md +++ b/README.md @@ -137,6 +137,10 @@ Supported: * [yarn](#yarn) * [bundler](#bundler) +Experimental: + +* [generic fetcher](#generic-fetcher) + Planned: * dnf @@ -237,6 +241,18 @@ Both files must be present in the source repository so you should check them int See [docs/bundler.md](docs/bundler.md) for more details. +### generic fetcher + +Generic fetcher is a way for Cachi2 to support pre-fetching arbitrary files that don't fit into other package managers. +With the generic fetcher, you can easily fetch those files with Cachi2 along with your other language-specific dependencies, +satisfy the hermetic build condition and have them recorded in the SBOM. + +Cachi2 uses a simple custom lockfile named `generic_lockfile.yaml` that is expected to be present in the repository. The +lockfile describes the urls, checksums and target locations for the downloaded files. The generic fetcher is currently an +experimental feature, so cachi2 has to be run with `--dev-package-managers` flag. + +See [docs/usage.md](docs/usage.md#pre-fetch-dependencies-generic-fetcher) for more details. + ## Project status Cachi2 was derived (but is not a direct fork) from [Cachito](https://github.com/containerbuildsystem/cachito) and is diff --git a/docs/adr/0001-add-generic-fetcher.md b/docs/adr/0001-add-generic-fetcher.md new file mode 100644 index 000000000..aceffc8f3 --- /dev/null +++ b/docs/adr/0001-add-generic-fetcher.md @@ -0,0 +1,109 @@ +# Add generic fetcher + +- Status: proposed +- Date: 2024-10-30 + +## Context + +The main motivation for this change is to cover use cases of users that need to download arbitrary files that don't fit +within an established package ecosystem cachi2 could potentially otherwise support. The target audience is users that +want to use cachi2 to achieve hermetic builds and want an easy way to also include these arbitrary files, that cachi2 +will account for in the SBOM it produces. + +## Decision + +This change introduces a generic fetcher, an additional cachi2 package manager. This package manager utilizes a custom +lockfile that is located in the input repository. Based on that lockfile, it will download files, save them into a requested +location, and verify checksums. Below is a more detailed overview of the implementation. + +### Lockfile format + +Cachi2 expects the lockfile to be named `generic_lockfile.yaml`. +In order to account for possible future breaking changes, the lockfile will contain a `metadata` section with a `version` +field that will indicate the version of the lockfile format. It will also contain a list of artifacts (files) to download, +each of the artifacts to have a URL, list of checksums, and optionally target location specified. + +```yaml +metadata: + # uses X.Y semantic versioning + version: "1.0" +artifacts: + - download_url: https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors?download=true + target: granite-model-1.safetensors + checksums: + sha256: d16bf783cb6670f7f692ad7d6885ab957c63cfc1b9649bc4a3ba1cfbdfd5230c +``` + +#### Lockfile properties + +Below is an explanation of individual properties of the lockfile. + +##### download_url (required) + +Specified as a string containing the download url of the artifact. + +##### checksums (required) + +Specified as a dictionary of checksum algorithms and their values. At least one cachi2-verifiable checksum must be provided +to ensure at least some degree of confidence in the identity of the artifact. + +#### target (optional) + +This key is provided mainly for the users convenience, so the files end up in expected locations. It is optional and if +not specified, it will be derived from the download_url. Target here means a specific subdirectory inside cachi2's output +directory for the generic fetcher (`{cachi2-output-dir}/deps/generic`). Cachi2 will verify that the target locations, +including those derived from download urls do not overlap. + +### SBOM components + +Artifacts fetched with the generic fetcher will all be recorded in the SBOM cachi2 produces. Given the inability to derive +any extra information about these files beyond a download location and a filename, these files will always be recorded +as SBOM components with purl of type generic. + +Additionally, the SBOM component will contain [externalReferences] of type `distribution` to indicate the url used to download +the file to allow for easier handling for tools that might process the SBOM. + +Here's an example SBOM generated for above file. + +```json +{ + "bomFormat": "CycloneDX", + "components": [ + { + "name": "granite-model-1.safetensors", + "purl": "pkg:generic/granite-model-1.safetensors?checksums=sha256:d16bf783cb6670f7f692ad7d6885ab957c63cfc1b9649bc4a3ba1cfbdfd5230c&download_url=https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors", + "properties": [ + { + "name": "cachi2:found_by", + "value": "cachi2" + } + ], + "type": "file", + "externalReferences": [ + { + "url": "https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors", + "type": "distribution" + } + ] + } + ], + "metadata": { + "tools": [ + { + "vendor": "red hat", + "name": "cachi2" + } + ] + }, + "specVersion": "1.4", + "version": 1 +} +``` + +## Consequences + +As mentioned before, this package manager enables users to fetch arbitrary files with cachi2 and have them accounted for +in the SBOM. Possible downside could be maintaining the lockfile format, as it is specific to cachi2 (which should be +partially mitigated by versioning it). + +[externalReferences]: https://cyclonedx.org/docs/1.6/json/#components_items_externalReferences diff --git a/docs/usage.md b/docs/usage.md index 52ba56e26..976099671 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -17,6 +17,7 @@ The second section goes through each of these steps for the supported package ma * [Example with pip](#example-pip) * [Example with npm](#example-npm) * [Example with yarn](#example-yarn) + * [Example with generic fetcher](#example-generic-fetcher) ## General Process @@ -598,3 +599,64 @@ podman build . \ --network none \ --tag sample-nodejs-app ``` + + +### Example: generic fetcher + +Generic fetcher is a package manager that can fetch arbitrary files. Let's build a sample container image that would be +inconvenient to build hermetically otherwise. This image will provide [OWASP Dependency check](https://github.com/jeremylong/DependencyCheck) +tool, which is available to install from GitHub releases page. Get the repo if you want to try for yourself: + +``` +git clone -b sample-app https://github.com/cachito-testing/cachi2-generic.git +``` + +#### Pre-fetch dependencies (generic fetcher) +In order to retrieve the archive with the tool, a `generic_lockfile.yaml` needs to be present +in the repository. Here's how that is going to look like. It simply defines url and the checksums +to verify identity of the file. + +``` +--- +metadata: + version: "1.0" +artifacts: + - download_url: "https://github.com/jeremylong/DependencyCheck/releases/download/v11.1.0/dependency-check-11.1.0-release.zip" + checksums: + sha256: "c5b5b9e592682b700e17c28f489fe50644ef54370edeb2c53d18b70824de1e22" + +``` + +As with other examples, the command to fetch dependencies is very similar. The default path +is assumed to be `.`. Since generic fetcher is still an experimental feature, it needs to be +enabled with the `--dev-package-managers` flag. + +``` +cachi2 fetch-deps --source ./cachi2-generic --output ./cachi2-output '{"type": "generic"}' --dev-package-managers +``` + +#### Build the application image (generic fetcher) +We'll use the `ibmjava:11-jdk` as base image because it already has java pre-installed. +During the build, the downloaded release will be extracted and modified to have execute rights. + +```Containerfile +FROM ibmjava:11-jdk + +WORKDIR /tmp + +# use jar to unzip file in order to avoid having to install more depependencies +RUN jar -xvf cachi2-output/deps/generic/dependency-check-11.1.0-release.zip + +RUN chmod +x dependency-check/bin/dependency-check.sh + +ENTRYPOINT ["/tmp/dependency-check/bin/dependency-check.sh", "--version"] +``` + +We can then build the image as before while mounting the required Cachi2 data. + +``` +podman build . \ + --volume "$(realpath ./cachi2-output)":/tmp/cachi2-output:Z \ + --network none \ + --tag sample-generic-app +```