Skip to content

Commit

Permalink
generic fetcher: Add usage docs and a ADR
Browse files Browse the repository at this point in the history
Add documentation on how to use the generic fetcher and also an ADR
to help move out of the experimental phase.

Signed-off-by: Jan Koscielniak <[email protected]>
  • Loading branch information
kosciCZ committed Nov 4, 2024
1 parent 293bcdf commit d8a3f6e
Show file tree
Hide file tree
Showing 3 changed files with 187 additions and 0 deletions.
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,10 @@ Supported:
* [yarn](#yarn)
* [bundler](#bundler)

Experimental:

* [generic fetcher](#generic-fetcher)

Planned:

* dnf
Expand Down Expand Up @@ -237,6 +241,18 @@ Both files must be present in the source repository so you should check them int

See [docs/bundler.md](docs/bundler.md) for more details.

### generic fetcher

Generic fetcher is a way for Cachi2 to support pre-fetching arbitrary files that don't fit into other package managers.
With the generic fetcher, you can easily fetch those files with Cachi2 along with your other language-specific dependencies,
satisfy the hermetic build condition and have them recorded in the SBOM.

Cachi2 uses a simple custom lockfile named `generic_lockfile.yaml` that is expected to be present in the repository. The
lockfile describes the urls, checksums and target locations for the downloaded files. The generic fetcher is currently an
experimental feature, so cachi2 has to be run with `--dev-package-managers` flag.

See [docs/usage.md](docs/usage.md#pre-fetch-dependencies-generic-fetcher) for more details.

## Project status

Cachi2 was derived (but is not a direct fork) from [Cachito](https://github.com/containerbuildsystem/cachito) and is
Expand Down
109 changes: 109 additions & 0 deletions docs/adr/0001-add-generic-fetcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Add generic fetcher

- Status: proposed
- Date: 2024-10-30

## Context

The main motivation for this change is to cover use cases of users that need to download arbitrary files that don't fit
within an established package ecosystem cachi2 could potentially otherwise support. The target audience is users that
want to use cachi2 to achieve hermetic builds and want an easy way to also include these arbitrary files, that cachi2
will account for in the SBOM it produces.

## Decision

This change introduces a generic fetcher, an additional cachi2 package manager. This package manager utilizes a custom
lockfile that is located in the input repository. Based on that lockfile, it will download files, save them into a requested
location, and verify checksums. Below is a more detailed overview of the implementation.

### Lockfile format

Cachi2 expects the lockfile to be named `generic_lockfile.yaml`.
In order to account for possible future breaking changes, the lockfile will contain a `metadata` section with a `version`
field that will indicate the version of the lockfile format. It will also contain a list of artifacts (files) to download,
each of the artifacts to have a URL, list of checksums, and optionally target location specified.

```yaml
metadata:
# uses X.Y semantic versioning
version: "1.0"
artifacts:
- download_url: https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors?download=true
target: granite-model-1.safetensors
checksums:
sha256: d16bf783cb6670f7f692ad7d6885ab957c63cfc1b9649bc4a3ba1cfbdfd5230c
```
#### Lockfile properties
Below is an explanation of individual properties of the lockfile.
##### download_url (required)
Specified as a string containing the download url of the artifact.
##### checksums (required)
Specified as a dictionary of checksum algorithms and their values. At least one cachi2-verifiable checksum must be provided
to ensure at least some degree of confidence in the identity of the artifact.
#### target (optional)
This key is provided mainly for the users convenience, so the files end up in expected locations. It is optional and if
not specified, it will be derived from the download_url. Target here means a specific subdirectory inside cachi2's output
directory for the generic fetcher (`{cachi2-output-dir}/deps/generic`). Cachi2 will verify that the target locations,
including those derived from download urls do not overlap.

### SBOM components

Artifacts fetched with the generic fetcher will all be recorded in the SBOM cachi2 produces. Given the inability to derive
any extra information about these files beyond a download location and a filename, these files will always be recorded
as SBOM components with purl of type generic.

Additionally, the SBOM component will contain [externalReferences] of type `distribution` to indicate the url used to download
the file to allow for easier handling for tools that might process the SBOM.

Here's an example SBOM generated for above file.

```json
{
"bomFormat": "CycloneDX",
"components": [
{
"name": "granite-model-1.safetensors",
"purl": "pkg:generic/granite-model-1.safetensors?checksums=sha256:d16bf783cb6670f7f692ad7d6885ab957c63cfc1b9649bc4a3ba1cfbdfd5230c&download_url=https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors",
"properties": [
{
"name": "cachi2:found_by",
"value": "cachi2"
}
],
"type": "file",
"externalReferences": [
{
"url": "https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors",
"type": "distribution"
}
]
}
],
"metadata": {
"tools": [
{
"vendor": "red hat",
"name": "cachi2"
}
]
},
"specVersion": "1.4",
"version": 1
}
```

## Consequences

As mentioned before, this package manager enables users to fetch arbitrary files with cachi2 and have them accounted for
in the SBOM. Possible downside could be maintaining the lockfile format, as it is specific to cachi2 (which should be
partially mitigated by versioning it).

[externalReferences]: https://cyclonedx.org/docs/1.6/json/#components_items_externalReferences
62 changes: 62 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The second section goes through each of these steps for the supported package ma
* [Example with pip](#example-pip)
* [Example with npm](#example-npm)
* [Example with yarn](#example-yarn)
* [Example with generic fetcher](#example-generic-fetcher)

## General Process

Expand Down Expand Up @@ -598,3 +599,64 @@ podman build . \
--network none \
--tag sample-nodejs-app
```


### Example: generic fetcher

Generic fetcher is a package manager that can fetch arbitrary files. Let's build a sample container image that would be
inconvenient to build hermetically otherwise. This image will provide [OWASP Dependency check](https://github.com/jeremylong/DependencyCheck)
tool, which is available to install from GitHub releases page. Get the repo if you want to try for yourself:

```
git clone -b sample-app https://github.com/cachito-testing/cachi2-generic.git
```

#### Pre-fetch dependencies (generic fetcher)
In order to retrieve the archive with the tool, a `generic_lockfile.yaml` needs to be present
in the repository. Here's how that is going to look like. It simply defines url and the checksums
to verify identity of the file.

```
---
metadata:
version: "1.0"
artifacts:
- download_url: "https://github.com/jeremylong/DependencyCheck/releases/download/v11.1.0/dependency-check-11.1.0-release.zip"
checksums:
sha256: "c5b5b9e592682b700e17c28f489fe50644ef54370edeb2c53d18b70824de1e22"
```

As with other examples, the command to fetch dependencies is very similar. The default path
is assumed to be `.`. Since generic fetcher is still an experimental feature, it needs to be
enabled with the `--dev-package-managers` flag.

```
cachi2 fetch-deps --source ./cachi2-generic --output ./cachi2-output '{"type": "generic"}' --dev-package-managers
```

#### Build the application image (generic fetcher)
We'll use the `ibmjava:11-jdk` as base image because it already has java pre-installed.
During the build, the downloaded release will be extracted and modified to have execute rights.

```Containerfile
FROM ibmjava:11-jdk

WORKDIR /tmp

# use jar to unzip file in order to avoid having to install more depependencies
RUN jar -xvf cachi2-output/deps/generic/dependency-check-11.1.0-release.zip

RUN chmod +x dependency-check/bin/dependency-check.sh

ENTRYPOINT ["/tmp/dependency-check/bin/dependency-check.sh", "--version"]
```

We can then build the image as before while mounting the required Cachi2 data.

```
podman build . \
--volume "$(realpath ./cachi2-output)":/tmp/cachi2-output:Z \
--network none \
--tag sample-generic-app
```

0 comments on commit d8a3f6e

Please sign in to comment.