diff --git a/docs/guide.mdx b/docs/guide.mdx index b63319d90..429b868b3 100644 --- a/docs/guide.mdx +++ b/docs/guide.mdx @@ -40,20 +40,6 @@ tower { The use of the Seqera access token is not mandatory, however, it's required to enable access to private repositories and it allows higher service rate limits compared to anonymous users. ::: -## API limits - -The Wave service implements API rate limits for API calls. Authenticated users have higher rate limits than anonymous users. - -If an access token is provided, the following rate limits apply: - -- 100 container builds per hour -- 1,000 container pulls per minute - -If an access token isn't provided, the following rate limits apply: - -- 25 container builds per day -- 250 container pulls per hour - ## Known limitation ### Use of sha256 digest in the image name @@ -73,38 +59,6 @@ wave.strategy = ['dockerfile'] wave.build.repository = 'docker.io//' ``` -## Tutorials - -### Store container images into a private repository - -Containers built by Wave are uploaded to the Wave default repository hosted on AWS ECR with name `195996028523.dkr.ecr.eu-west-1.amazonaws.com/wave/build`. The images in this repository are automatically deleted 1 week from the date of their push. - -If you want to store Wave containers in your own container repository use the following settings in the Nextflow configuration file: - -```groovy -wave.build.repository = 'example.com/your/build-repo' -wave.build.cacheRepository = 'example.com/your/cache-repo' -``` - -The first repository is used to store the built container images. The second one is used to store the individual image layers for caching purposes. - -The repository access keys need to be specified using the Seqera Platform credentials manager as specified in the [Authenticate private repositories](#Authenticate private repositories) section. - -## Advanced settings - -The following configuration options are available: - -| Method | Description | -| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `wave.enabled` | Enable/disable the execution of Wave containers | -| `wave.endpoint` | The Wave service endpoint (default: `https://wave.seqera.io`) | -| `wave.build.repository` | The container repository where image built by Wave needs to be uploaded (note: the corresponding credentials need to be provided in your Seqera Platform account). | -| `wave.build.cacheRepository` | The container repository used to cache image layers build by the Wave service (note: the corresponding credentials need to be provided in your Seqera Platform account). | -| `wave.conda.mambaImage` | The Mamba container image is used to build Conda based container. This is expected to be [micromamba-docker](https://github.com/mamba-org/micromamba-docker) image. | -| `wave.conda.commands` | One or more commands to be added to the Dockerfile used by build a Conda based image. | -| `wave.strategy` | The strategy to be used when resolving ambiguous Wave container requirement (default: `'container,dockerfile,conda'`) | -| `wave.freeze` | When `freeze` mode is enabled containers provisioned by Wave are stored permanently in the repository specified via the setting `wave.build.repository`. | - ## More examples Check out the [Wave showcase repository](https://github.com/seqeralabs/wave-showcase) for more examples how to use Wave containers. diff --git a/docs/guides/private-repo.mdx b/docs/guides/private-repo.mdx index eca0ef101..9842156cd 100644 --- a/docs/guides/private-repo.mdx +++ b/docs/guides/private-repo.mdx @@ -15,5 +15,20 @@ tower { That's it. When launching the pipeline execution, Wave will allow Nextflow to access the private container repositories defined in your pipeline configuration, using the credentials stored in the Seqera Platform credentials manager. +==== + +Containers built by Wave are uploaded to the Wave default repository hosted on AWS ECR with name `195996028523.dkr.ecr.eu-west-1.amazonaws.com/wave/build`. The images in this repository are automatically deleted 1 week from the date of their push. + +If you want to store Wave containers in your own container repository use the following settings in the Nextflow configuration file: + +```groovy +wave.build.repository = 'example.com/your/build-repo' +wave.build.cacheRepository = 'example.com/your/cache-repo' +``` + +The first repository is used to store the built container images. The second one is used to store the individual image layers for caching purposes. + +The repository access keys need to be specified using the Seqera Platform credentials manager as specified in the [Authenticate private repositories](#Authenticate private repositories) section. + [credentials]: /platform_versioned_docs/version-23.4.0/credentials/overview [pat]: /platform_versioned_docs/version-23.4.0/api/overview#authentication diff --git a/docs/index.mdx b/docs/index.mdx index 0b9fa81cb..c812fc4ea 100644 --- a/docs/index.mdx +++ b/docs/index.mdx @@ -2,64 +2,8 @@ title: Wave containers --- -Containers are an essential part of data analysis in the cloud. Building and delivering optimized, context-aware container images slows down development. +Containers are an essential part of modern data analysis pipelines in bioinformatics. They encapsulate applications and dependencies in portable, self-contained packages that can be easily distributed across diverse computing environments. Containers are also key to enabling predictable and reproducible scientific results. -Wave is a container provisioning service designed for use with data analysis applications such as Nextflow. +However, the increasing complexity of pipelines and the need to deploy them across diverse cloud and HPC environments poses new challenges. Today, workflows may comprise dozens of distinct container images. Pipeline developers must manage and maintain these container images and ensure that their functionality precisely aligns with the requirements of every pipeline task, creating unnecessary friction in the maintenance and deployment of data pipelines. -It allows for the on-demand assembly, augmentation, and deployment of containerized images based on task requirements. - -The Wave container service itself is not a container registry. All containers builds are stored in a Seqera-hosted image registry for a limited time or frozen to a user-specified container registry. - -## Features - -### Private container registries - -Container registry authentication is the new norm. Yet when it comes to authenticating against cloud-specific container registries, the process is hardly hassle free. -Wave integrates with Seqera Platform credentials management enabling seamless access and publishing to private registries. - -### Augment existing containers - -Regulatory and security requirements sometimes dictate specific container images, but additional context is often needed. -Wave enables any existing container to be extended without rebuilding it. Developers can add user-provided content such as custom scripts and logging agents, providing greater flexibility in the container’s configuration. - -Wave offers a flexible approach to container image management. It allows you to dynamically add custom layers to existing docker images, creating new images tailored to your specific needs. - -#### An example of Wave augmentation - -Imagine you have a base Ubuntu image in a container registry. Wave acts as a proxy between your docker client and the registry. When you request an augmented image, Wave intercepts the process. - -1. Base image layers download: The Docker client downloads the standard Ubuntu layers from the registry. -2. Custom layer injection: Wave injects your custom layer, denoted by "ω", which could represent application code, libraries, configurations etc. -3. New image creation: Wave combines the downloaded Ubuntu layers with your custom layer, effectively creating a new image on the fly. - -![](_images/wave_container_augmentation.png) - -#### Benefits of Wave augmentation - -1. Streamlined workflows: Wave simplifies your workflow by eliminating the need to manually build and manage custom images. -2. Flexibility: You can easily modify the custom layer for different use cases, allowing for greater adaptability. - -### Conda based containers - -Package management systems such as Conda and Bioconda simplify the installation of scientific software. However, there’s considerable friction when it comes to using those tools to deploy pipelines in cloud environments. -Wave enables dynamic provisioning of container images from any Conda or Bioconda recipe. Just declare the Conda packages in your Nextflow pipeline and Wave will assemble the required container. - -### Deploying containers across multi-clouds - -Cloud vendors provide integrated container registries, providing better performance and cost-efficiency than central, remote registries. -This requires mirroring container collections across multiple accounts, regions, and cloud providers when deploying multi-cloud pipelines. -Wave streamlines this process by provisioning the required containers to the target registry on-demand during the pipeline executions. - -### Container security scanning - -Builds for OCI-compliant container images are automatically scanned for known security vulnerabilities. Wave conducts a vulnerability scan using the [Trivy](https://trivy.dev/) security scanner. Seqera Platform customers receive an email that includes a link to the security report listing any vulnerabilities discovered. - -### Optimize workloads for specific architectures - -Modern data pipelines can be deployed across different data centers having different hardware architectures. e.g., amd64, arm64, and others. This requires curating different collections of containers for each architecture. -Wave allows for the on-demand provisioning of containers, depending on the target execution platform (in development). - -### Near caching - -The deployment of production pipelines at scale can require the use of multiple cloud regions to enable efficient resource allocation. -However, this can result in an increased overhead when pulling container images from a central container registry. Wave allows the transparent caching of container images in the same region where computation occurs, reducing data transfer costs and time (in development). +Wave tackles this problem by provisioning containers on-demand during the pipeline execution. This allows the delivery of container images that are defined precisely depending on the requirements of each pipeline task in terms of dependencies and platform architecture. This process is completely transparent and fully automated, removing all the plumbing and friction commonly needed to create, upload, and maintain dozens of container images that might be required by a pipeline execution. diff --git a/docs/nextflow/configuration.mdx b/docs/nextflow/configuration.mdx new file mode 100644 index 000000000..8e3093048 --- /dev/null +++ b/docs/nextflow/configuration.mdx @@ -0,0 +1,17 @@ +--- +title: Nextflow configuration for Wave +--- + +The following configuration options are available: + +| Method | Description | +| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `wave.enabled` | Enable/disable the execution of Wave containers | +| `wave.endpoint` | The Wave service endpoint (default: `https://wave.seqera.io`) | +| `wave.build.repository` | The container repository where image built by Wave needs to be uploaded (note: the corresponding credentials need to be provided in your Seqera Platform account). | +| `wave.build.cacheRepository` | The container repository used to cache image layers build by the Wave service (note: the corresponding credentials need to be provided in your Seqera Platform account). | +| `wave.conda.mambaImage` | The Mamba container image is used to build Conda based container. This is expected to be [micromamba-docker](https://github.com/mamba-org/micromamba-docker) image. | +| `wave.conda.commands` | One or more commands to be added to the Dockerfile used by build a Conda based image. | +| `wave.strategy` | The strategy to be used when resolving ambiguous Wave container requirement (default: `'container,dockerfile,conda'`) | +| `wave.freeze` | When `freeze` mode is enabled containers provisioned by Wave are stored permanently in the repository specified via the setting `wave.build.repository`. | + diff --git a/docs/service/architecture.mdx b/docs/service/architecture.mdx new file mode 100644 index 000000000..f8740c93f --- /dev/null +++ b/docs/service/architecture.mdx @@ -0,0 +1,5 @@ +--- +title: Architecture +--- + +Currently this is TBD. diff --git a/docs/service/augmentation.mdx b/docs/service/augmentation.mdx new file mode 100644 index 000000000..9251a106a --- /dev/null +++ b/docs/service/augmentation.mdx @@ -0,0 +1,27 @@ +--- +title: Container augmentation +--- + +The container augmentation provisioning mode allows "extending" the content of a container image without rebuilding it. Instead, this mechanism allows modifying a container image during the pull phase made by a Docker client. + +Container augmentation works as follows: + +1. The client, either Nextflow or Wave client, submits a container request specifying: i) the (Platform) user identity; ii) the container image to be augmented; iii) the container extension configuration, which can be either a custom payload, one or more extension layers or container images. +1. The Wave service validates the request and authorizes the user submitting a request to the Platform service +1. Finally, the Wave service responds with an ephemeral container image name e.g. wave.seqera.io/wt//library/alpine:latest + The ID TOKEN is uniquely assigned and it's used to identify and authorize the following container request. +1. The Docker client uses the return image name to pull the container binary content of the upstream image directly from the target registry, finally, the content added by Wave as one or more layer extensions is shipped by the Wave service + +*Key points* + +- Wave acts as a proxy between the Docker client and the target registry that hosts the container image. +- During this process, Wave modifies, if needed, the container manifest to add the new content as specified by the request, but it does not (and cannot) alter the container layer blob files that have a unique checksum that is preserved. +- The image blobs are downloaded directly from the target registry (not from Wave) **[there's an exception to be discussed]** +- The extended content added by Wave is served via Cloudflare CDN. +- This process does not carry out any "build" operation behind the scenes. +- Augmented containers are ephemeral: they are not stored in a container repository, and they can only be accessed for a short period of time. + +*Use cases* + +- Authenticate access to the private repositories via Platform credentials +- Extending existing containers by adding infrastructure and pipeline dependencies on the fly without rebuilding and maintaining additional container images diff --git a/docs/service/community-registry.mdx b/docs/service/community-registry.mdx new file mode 100644 index 000000000..0388b7842 --- /dev/null +++ b/docs/service/community-registry.mdx @@ -0,0 +1,11 @@ +--- +title: Community registry +--- + +The Community registry adds to the Wave containerization lifecycle a regular container registry to host image builds permanently and is accessible publicly by anyone. + +The community registry is built using [Docker Distribution][docker] and hosted on AWS infrastructure. Images are cached and served via Cloudflare CDN. + +*WIP Diagram* + +[docker]: https://github.com/distribution/distribution diff --git a/docs/service/features.mdx b/docs/service/features.mdx new file mode 100644 index 000000000..3c65797a0 --- /dev/null +++ b/docs/service/features.mdx @@ -0,0 +1,69 @@ +--- +title: Features +--- + +|||||||| +|--- |--- |--- |--- |--- |--- |--- | +||Provisioning mode|Source|Freeze|Build repo|Accessibility|Format| +|Ephemeral|Augmentation|Container image|No|n/a|Temporary token|Docker| +|Ephemeral|Build|Container file|No|Default|Temporary token|Docker| +|Ephemeral|Build|Conda package|No|Default|Temporary token|Docker| +|Ephemeral|Build|Container file|No|Custom|Temporary token|Docker| +|Ephemeral|Build|Conda package|No|Custom|Temporary token|Docker| +|Durable|Build|Container file|Yes|Custom|Docker auth|Docker /Singularity| +|Durable|Build|Conda package|Yes|Custom|Docker auth|Docker /Singularity| +|Community (durable)|Build|Container file|Yes|Default|Public|Docker /Singularity| +|Community (durable)|Build|Conda package|Yes|Default|Public|Docker /Singularity| + + +## Private container registries + +Container registry authentication is the new norm. Yet when it comes to authenticating against cloud-specific container registries, the process is hardly hassle free. +Wave integrates with Seqera Platform credentials management enabling seamless access and publishing to private registries. + +## Augment existing containers + +Regulatory and security requirements sometimes dictate specific container images, but additional context is often needed. +Wave enables any existing container to be extended without rebuilding it. Developers can add user-provided content such as custom scripts and logging agents, providing greater flexibility in the container’s configuration. + +Wave offers a flexible approach to container image management. It allows you to dynamically add custom layers to existing docker images, creating new images tailored to your specific needs. + +### An example of Wave augmentation + +Imagine you have a base Ubuntu image in a container registry. Wave acts as a proxy between your docker client and the registry. When you request an augmented image, Wave intercepts the process. + +1. Base image layers download: The Docker client downloads the standard Ubuntu layers from the registry. +2. Custom layer injection: Wave injects your custom layer, denoted by "ω", which could represent application code, libraries, configurations etc. +3. New image creation: Wave combines the downloaded Ubuntu layers with your custom layer, effectively creating a new image on the fly. + +![](_images/wave_container_augmentation.png) + +### Benefits of Wave augmentation + +1. Streamlined workflows: Wave simplifies your workflow by eliminating the need to manually build and manage custom images. +2. Flexibility: You can easily modify the custom layer for different use cases, allowing for greater adaptability. + +## Conda based containers + +Package management systems such as Conda and Bioconda simplify the installation of scientific software. However, there’s considerable friction when it comes to using those tools to deploy pipelines in cloud environments. +Wave enables dynamic provisioning of container images from any Conda or Bioconda recipe. Just declare the Conda packages in your Nextflow pipeline and Wave will assemble the required container. + +## Deploying containers across multi-clouds + +Cloud vendors provide integrated container registries, providing better performance and cost-efficiency than central, remote registries. +This requires mirroring container collections across multiple accounts, regions, and cloud providers when deploying multi-cloud pipelines. +Wave streamlines this process by provisioning the required containers to the target registry on-demand during the pipeline executions. + +## Container security scanning + +Builds for OCI-compliant container images are automatically scanned for known security vulnerabilities. Wave conducts a vulnerability scan using the [Trivy](https://trivy.dev/) security scanner. Seqera Platform customers receive an email that includes a link to the security report listing any vulnerabilities discovered. + +## Optimize workloads for specific architectures + +Modern data pipelines can be deployed across different data centers having different hardware architectures. e.g., amd64, arm64, and others. This requires curating different collections of containers for each architecture. +Wave allows for the on-demand provisioning of containers, depending on the target execution platform (in development). + +## Near caching + +The deployment of production pipelines at scale can require the use of multiple cloud regions to enable efficient resource allocation. +However, this can result in an increased overhead when pulling container images from a central container registry. Wave allows the transparent caching of container images in the same region where computation occurs, reducing data transfer costs and time (in development). diff --git a/docs/service/freeze.mdx b/docs/service/freeze.mdx new file mode 100644 index 000000000..2f2bc2ba4 --- /dev/null +++ b/docs/service/freeze.mdx @@ -0,0 +1,28 @@ +--- +title: Container freeze +--- + +The container freeze mode allows the provisioning of non-ephemeral containers that can be stored permanently in a container registry of your choice. When using the freeze mode, the Wave service transparently carries out a regular build. + +Freeze Mode Description: + +1. The client, either Nextflow or Wave client, submits a container request specifying: i) the (Platform) user identity; ii) the container image to augment; iii) the container extension configuration, which can be either a custom payload, one or more extension layers or container images; iv) the target repository where the built container should be uploaded. +1. The Wave service validates the request and authorizes the user via a request to the Platform service. +1. The Wave service checks if the container image already exists in the target registry +1. If the image does not exist, Wave launches a container build job and pushes the resulting image to the target registry +1. The Wave service responds with the container image name e.g. your.registry.com/some/image/build:1234567 + +*Key points* + +- Container images provisioned via the freeze mode are regular container builds. +- Each container image is associated with a unique ID that is obtained by hashing i) the Container file, ii) any package dependencies, iii) the target platform i.e. amd64 or arm64, iv) the target repository name. +- When a request for the same container is made, the same ID is assigned to it and therefore, the build is skipped. +- The resulting images are hosted in the customer repository and not cached locally (provided that a cache repository is specified) +- The container images are stored permanently as long as the repository owner does not delete it + +*Use cases* + +- Create container images on-demand via Conda packages +- Deliver multi-architecture (amd64, arm64) and multi-format (Docker, Singularity) container collections. +- Deliver container images in the same region where compute is performed + diff --git a/docs/service/limits.mdx b/docs/service/limits.mdx new file mode 100644 index 000000000..3bdfded0c --- /dev/null +++ b/docs/service/limits.mdx @@ -0,0 +1,15 @@ +--- +title: API limits +--- + +The Wave service implements API rate limits for API calls. Authenticated users have higher rate limits than anonymous users. + +If an access token is provided, the following rate limits apply: + +- 100 container builds per hour +- 1,000 container pulls per minute + +If an access token isn't provided, the following rate limits apply: + +- 25 container builds per day +- 250 container pulls per hour diff --git a/docs/service/singularity-containers.mdx b/docs/service/singularity-containers.mdx new file mode 100644 index 000000000..58ca6e018 --- /dev/null +++ b/docs/service/singularity-containers.mdx @@ -0,0 +1,11 @@ +--- +title: Singularity containers +--- + +Singularity and Apptainer use a proprietary format called Singularity Image Format (SIF). +Wave does can provision containers based on the Singularity image format either using a `Singularityfile` or Conda package(s). The resulting Singularity image file is stored as an ORAS artefact in an OCI-compliant container registry of your choice or the Wave Community registry. + +The advantage of this approach is that Singularity and Apptainer engines can pull and execute those container images natively without requiring extra conversion steps, as needed when using Docker images with those two engines. + +Note: considering the Singularity image format's peculiarities, Wave's freeze mode is mandatory when provisioning Singulairy images. + diff --git a/docs/sidebar.json b/docs/sidebar.json index 5aeed8fab..066c005f7 100644 --- a/docs/sidebar.json +++ b/docs/sidebar.json @@ -1,6 +1,23 @@ { "sidebar": [ - "index", + { + "type": "category", + "label": "Wave service", + "collapsed": false, + "link": { + "type": "doc", + "id": "index" + }, + "items": [ + "service/features", + "service/architecture", + "service/limits", + "service/singularity-containers", + "service/community-registry", + "service/augmentation", + "service/freeze" + ] + }, { "type": "category", "label": "Wave CLI", @@ -25,10 +42,10 @@ "collapsed": false, "items": [ "guides/conda-containers", - "fusion-file-system", - "private-repo", - "singularity-containers", - "module-containers" + "guides/fusion-file-system", + "guides/private-repo", + "guides/singularity-containers", + "guides/module-containers" ] }, "guide",