Skip to content

Commit

Permalink
Add initial sections from Google Doc
Browse files Browse the repository at this point in the history
  • Loading branch information
jason-seqera committed Jul 11, 2024
1 parent c158d82 commit 29752d4
Show file tree
Hide file tree
Showing 12 changed files with 223 additions and 110 deletions.
46 changes: 0 additions & 46 deletions docs/guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,20 +40,6 @@ tower {
The use of the Seqera access token is not mandatory, however, it's required to enable access to private repositories and it allows higher service rate limits compared to anonymous users.
:::

## API limits

The Wave service implements API rate limits for API calls. Authenticated users have higher rate limits than anonymous users.

If an access token is provided, the following rate limits apply:

- 100 container builds per hour
- 1,000 container pulls per minute

If an access token isn't provided, the following rate limits apply:

- 25 container builds per day
- 250 container pulls per hour

## Known limitation

### Use of sha256 digest in the image name
Expand All @@ -73,38 +59,6 @@ wave.strategy = ['dockerfile']
wave.build.repository = 'docker.io/<user>/<repository>'
```

## Tutorials

### Store container images into a private repository

Containers built by Wave are uploaded to the Wave default repository hosted on AWS ECR with name `195996028523.dkr.ecr.eu-west-1.amazonaws.com/wave/build`. The images in this repository are automatically deleted 1 week from the date of their push.

If you want to store Wave containers in your own container repository use the following settings in the Nextflow configuration file:

```groovy
wave.build.repository = 'example.com/your/build-repo'
wave.build.cacheRepository = 'example.com/your/cache-repo'
```

The first repository is used to store the built container images. The second one is used to store the individual image layers for caching purposes.

The repository access keys need to be specified using the Seqera Platform credentials manager as specified in the [Authenticate private repositories](#Authenticate private repositories) section.

## Advanced settings

The following configuration options are available:

| Method | Description |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `wave.enabled` | Enable/disable the execution of Wave containers |
| `wave.endpoint` | The Wave service endpoint (default: `https://wave.seqera.io`) |
| `wave.build.repository` | The container repository where image built by Wave needs to be uploaded (note: the corresponding credentials need to be provided in your Seqera Platform account). |
| `wave.build.cacheRepository` | The container repository used to cache image layers build by the Wave service (note: the corresponding credentials need to be provided in your Seqera Platform account). |
| `wave.conda.mambaImage` | The Mamba container image is used to build Conda based container. This is expected to be [micromamba-docker](https://github.com/mamba-org/micromamba-docker) image. |
| `wave.conda.commands` | One or more commands to be added to the Dockerfile used by build a Conda based image. |
| `wave.strategy` | The strategy to be used when resolving ambiguous Wave container requirement (default: `'container,dockerfile,conda'`) |
| `wave.freeze` | When `freeze` mode is enabled containers provisioned by Wave are stored permanently in the repository specified via the setting `wave.build.repository`. |

## More examples

Check out the [Wave showcase repository](https://github.com/seqeralabs/wave-showcase) for more examples how to use Wave containers.
15 changes: 15 additions & 0 deletions docs/guides/private-repo.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,20 @@ tower {

That's it. When launching the pipeline execution, Wave will allow Nextflow to access the private container repositories defined in your pipeline configuration, using the credentials stored in the Seqera Platform credentials manager.

====

Containers built by Wave are uploaded to the Wave default repository hosted on AWS ECR with name `195996028523.dkr.ecr.eu-west-1.amazonaws.com/wave/build`. The images in this repository are automatically deleted 1 week from the date of their push.

If you want to store Wave containers in your own container repository use the following settings in the Nextflow configuration file:

```groovy
wave.build.repository = 'example.com/your/build-repo'
wave.build.cacheRepository = 'example.com/your/cache-repo'
```

The first repository is used to store the built container images. The second one is used to store the individual image layers for caching purposes.

The repository access keys need to be specified using the Seqera Platform credentials manager as specified in the [Authenticate private repositories](#Authenticate private repositories) section.

[credentials]: /platform_versioned_docs/version-23.4.0/credentials/overview
[pat]: /platform_versioned_docs/version-23.4.0/api/overview#authentication
62 changes: 3 additions & 59 deletions docs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,64 +2,8 @@
title: Wave containers
---

Containers are an essential part of data analysis in the cloud. Building and delivering optimized, context-aware container images slows down development.
Containers are an essential part of modern data analysis pipelines in bioinformatics. They encapsulate applications and dependencies in portable, self-contained packages that can be easily distributed across diverse computing environments. Containers are also key to enabling predictable and reproducible scientific results.

Wave is a container provisioning service designed for use with data analysis applications such as Nextflow.
However, the increasing complexity of pipelines and the need to deploy them across diverse cloud and HPC environments poses new challenges. Today, workflows may comprise dozens of distinct container images. Pipeline developers must manage and maintain these container images and ensure that their functionality precisely aligns with the requirements of every pipeline task, creating unnecessary friction in the maintenance and deployment of data pipelines.

It allows for the on-demand assembly, augmentation, and deployment of containerized images based on task requirements.

The Wave container service itself is not a container registry. All containers builds are stored in a Seqera-hosted image registry for a limited time or frozen to a user-specified container registry.

## Features

### Private container registries

Container registry authentication is the new norm. Yet when it comes to authenticating against cloud-specific container registries, the process is hardly hassle free.
Wave integrates with Seqera Platform credentials management enabling seamless access and publishing to private registries.

### Augment existing containers

Regulatory and security requirements sometimes dictate specific container images, but additional context is often needed.
Wave enables any existing container to be extended without rebuilding it. Developers can add user-provided content such as custom scripts and logging agents, providing greater flexibility in the container’s configuration.

Wave offers a flexible approach to container image management. It allows you to dynamically add custom layers to existing docker images, creating new images tailored to your specific needs.

#### An example of Wave augmentation

Imagine you have a base Ubuntu image in a container registry. Wave acts as a proxy between your docker client and the registry. When you request an augmented image, Wave intercepts the process.

1. Base image layers download: The Docker client downloads the standard Ubuntu layers from the registry.
2. Custom layer injection: Wave injects your custom layer, denoted by "ω", which could represent application code, libraries, configurations etc.
3. New image creation: Wave combines the downloaded Ubuntu layers with your custom layer, effectively creating a new image on the fly.

![](_images/wave_container_augmentation.png)

#### Benefits of Wave augmentation

1. Streamlined workflows: Wave simplifies your workflow by eliminating the need to manually build and manage custom images.
2. Flexibility: You can easily modify the custom layer for different use cases, allowing for greater adaptability.

### Conda based containers

Package management systems such as Conda and Bioconda simplify the installation of scientific software. However, there’s considerable friction when it comes to using those tools to deploy pipelines in cloud environments.
Wave enables dynamic provisioning of container images from any Conda or Bioconda recipe. Just declare the Conda packages in your Nextflow pipeline and Wave will assemble the required container.

### Deploying containers across multi-clouds

Cloud vendors provide integrated container registries, providing better performance and cost-efficiency than central, remote registries.
This requires mirroring container collections across multiple accounts, regions, and cloud providers when deploying multi-cloud pipelines.
Wave streamlines this process by provisioning the required containers to the target registry on-demand during the pipeline executions.

### Container security scanning

Builds for OCI-compliant container images are automatically scanned for known security vulnerabilities. Wave conducts a vulnerability scan using the [Trivy](https://trivy.dev/) security scanner. Seqera Platform customers receive an email that includes a link to the security report listing any vulnerabilities discovered.

### Optimize workloads for specific architectures

Modern data pipelines can be deployed across different data centers having different hardware architectures. e.g., amd64, arm64, and others. This requires curating different collections of containers for each architecture.
Wave allows for the on-demand provisioning of containers, depending on the target execution platform (in development).

### Near caching

The deployment of production pipelines at scale can require the use of multiple cloud regions to enable efficient resource allocation.
However, this can result in an increased overhead when pulling container images from a central container registry. Wave allows the transparent caching of container images in the same region where computation occurs, reducing data transfer costs and time (in development).
Wave tackles this problem by provisioning containers on-demand during the pipeline execution. This allows the delivery of container images that are defined precisely depending on the requirements of each pipeline task in terms of dependencies and platform architecture. This process is completely transparent and fully automated, removing all the plumbing and friction commonly needed to create, upload, and maintain dozens of container images that might be required by a pipeline execution.
17 changes: 17 additions & 0 deletions docs/nextflow/configuration.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: Nextflow configuration for Wave
---

The following configuration options are available:

| Method | Description |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `wave.enabled` | Enable/disable the execution of Wave containers |
| `wave.endpoint` | The Wave service endpoint (default: `https://wave.seqera.io`) |
| `wave.build.repository` | The container repository where image built by Wave needs to be uploaded (note: the corresponding credentials need to be provided in your Seqera Platform account). |
| `wave.build.cacheRepository` | The container repository used to cache image layers build by the Wave service (note: the corresponding credentials need to be provided in your Seqera Platform account). |
| `wave.conda.mambaImage` | The Mamba container image is used to build Conda based container. This is expected to be [micromamba-docker](https://github.com/mamba-org/micromamba-docker) image. |
| `wave.conda.commands` | One or more commands to be added to the Dockerfile used by build a Conda based image. |
| `wave.strategy` | The strategy to be used when resolving ambiguous Wave container requirement (default: `'container,dockerfile,conda'`) |
| `wave.freeze` | When `freeze` mode is enabled containers provisioned by Wave are stored permanently in the repository specified via the setting `wave.build.repository`. |

5 changes: 5 additions & 0 deletions docs/service/architecture.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
title: Architecture
---

Currently this is TBD.
27 changes: 27 additions & 0 deletions docs/service/augmentation.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: Container augmentation
---

The container augmentation provisioning mode allows "extending" the content of a container image without rebuilding it. Instead, this mechanism allows modifying a container image during the pull phase made by a Docker client.

Container augmentation works as follows:

1. The client, either Nextflow or Wave client, submits a container request specifying: i) the (Platform) user identity; ii) the container image to be augmented; iii) the container extension configuration, which can be either a custom payload, one or more extension layers or container images.
1. The Wave service validates the request and authorizes the user submitting a request to the Platform service
1. Finally, the Wave service responds with an ephemeral container image name e.g. wave.seqera.io/wt/<ID TOKEN>/library/alpine:latest
The ID TOKEN is uniquely assigned and it's used to identify and authorize the following container request.
1. The Docker client uses the return image name to pull the container binary content of the upstream image directly from the target registry, finally, the content added by Wave as one or more layer extensions is shipped by the Wave service

*Key points*

- Wave acts as a proxy between the Docker client and the target registry that hosts the container image.
- During this process, Wave modifies, if needed, the container manifest to add the new content as specified by the request, but it does not (and cannot) alter the container layer blob files that have a unique checksum that is preserved.
- The image blobs are downloaded directly from the target registry (not from Wave) **[there's an exception to be discussed]**
- The extended content added by Wave is served via Cloudflare CDN.
- This process does not carry out any "build" operation behind the scenes.
- Augmented containers are ephemeral: they are not stored in a container repository, and they can only be accessed for a short period of time.

*Use cases*

- Authenticate access to the private repositories via Platform credentials
- Extending existing containers by adding infrastructure and pipeline dependencies on the fly without rebuilding and maintaining additional container images
11 changes: 11 additions & 0 deletions docs/service/community-registry.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: Community registry
---

The Community registry adds to the Wave containerization lifecycle a regular container registry to host image builds permanently and is accessible publicly by anyone.

The community registry is built using [Docker Distribution][docker] and hosted on AWS infrastructure. Images are cached and served via Cloudflare CDN.

*WIP Diagram*

[docker]: https://github.com/distribution/distribution
Loading

0 comments on commit 29752d4

Please sign in to comment.