Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nix-snapshotter support to the embedded containerd #9319

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hinshun
Copy link
Contributor

@hinshun hinshun commented Jan 31, 2024

Proposed Changes

  • Add nix-snapshotter plugin to the embedded containerd to enable rootless k3s + nix-snapshotter

Types of Changes

  • Adds nix-snapshotter as a go dependency
  • Add nix as a valid snapshotter option for the agent
  • When using nix snapshotter, also use it as an image service

Verification

  • Add a target to Dockerfile to copy the k3s binary to the nixos/nix docker image, to enable quick verification of k3s + nix-snapshotter

Testing

  • Integration test to be added

User-Facing Change

Add nix-snapshotter plugin to the embedded containerd to enable rootless k3s + nix-snapshotter

Further Comments

Currently still work-in-progress, but wanted to start the conversation whether k3s has the appetite for including nix support. As noted in #9309, in rootless mode the only way is to use the embedded containerd, so rootless k3s + nix-snapshotter is only possible via a contribution.

See this for an exciting feature of nix-snapshotter as an image service that enables fully declarative pods (down to the contents of the container image). An image like nix:0/nix/store/f8b1hia3hcqwa5d46anzy3cszi3s6ybk-nix-image-redis.tar is resolved by nix-snapshotter, which is a merkle hash of the Nix packages that make up the image tarball.

@brandond
Copy link
Member

This is interesting; can you describe in more detail how it might be useful, and when it can be used?

See the eStargz documentation for an example of what sort of info we might be looking for before adding another snapshotter.

@hinshun
Copy link
Contributor Author

hinshun commented Feb 3, 2024

Nix is a package manager / build system that has a complete understanding of build & runtime inputs for every package. Nix packages are stored in a global hashed path like: /nix/store/s66mzxpvicwk07gjbjfw9izjfa797vsw-hello-2.12.1. Packages usually follow a FHS convention, so Nix packages are typically directories containing other directories like bin, share, etc. For example, the hello binary would be available via /nix/store/s66mzxpvicwk07gjbjfw9izjfa797vsw-hello-2.12.1/bin/hello.

Runtime dependencies down to glibc are also inside /nix/store, so it really has a complete dependency graph. In the case of hello, the complete closure is following:

/nix/store/3n58xw4373jp0ljirf06d8077j15pc4j-glibc-2.37-8
/nix/store/fz2c8qahxza5ygy4yvwdqzbck1bs3qag-libidn2-2.3.4
/nix/store/q7hi3rvpfgc232qkdq2dacmvkmsrnldg-libunistring-1.1
/nix/store/ryvnrp5n6kqv3fl20qy2xgcgdsza7i0m-xgcc-12.3.0-libgcc
/nix/store/s66mzxpvicwk07gjbjfw9izjfa797vsw-hello-2.12.1

If you inspect its ELF data, you can indeed see its linked against that specific glibc:

$ readelf -d /nix/store/s66mzxpvicwk07gjbjfw9izjfa797vsw-hello-2.12.1/bin/hello | grep runpath
 0x000000000000001d (RUNPATH)            Library runpath: [/nix/store/3n58xw4373jp0ljirf06d8077j15pc4j-glibc-2.37-8/lib]

This means that a root filesystem containing that closure is sufficient to run hello even if it's dynamically linked. This is similar to minimal images containing a statically compiled go binary or like distroless which leverages bazel to the same effect.

Kubernetes is a great orchestration engine for minimal Nix images, but the current mechanism of building Nix-based image need to compress Nix packages into Docker layer tarballs. There are some heuristics to split Nix packages across layers to improve deduplication but because overlayfs has a 128 layer limit there's bound to be layers containing multiple Nix packages.

Nix-snapshotter is a containerd snapshotter that natively understands Nix packages. It also provides a client-side library to build special OCI images with tiny layers that contain the Nix package closure in the annotations field. The Nix daemon understands S3 and HTTP binary caches to fetch packages by hash, so nix-snapshotter leverages those protocols to download packages and returns a list of bind mounts that make up the container root filesystem. Since nix-snapshotter is built on top of the overlayfs snapshotter, it maintains compatibility with all existing images and also allow hybrid images where some layers can be regular tarballs and some contain Nix package annotations.

For teams that deploy to both bare metal & Kubernetes, this is a huge boon to maintain a single stack and the overhead of containerization is just a few kilobytes of metadata (in the OCI manifest). Most teams using Nix already upload all their artifacts to a binary cache or S3, but also have to re-upload essentially the same data to a Docker registry backend. Lastly, since nix-snapshotter pulls at a package granularity, it's strictly more efficient than Docker's layer granularity. Instant image builds, faster push/pull and less storage costs.

When using nix-snapshotter as an image service, deploying to Kubernetes no longer needs a Docker Registry. Since the image reference is also a Nix package, it can be fetched from a Nix binary cache / S3. This means a single GC policy for bare-metal & kubelet nodes, and a single GC policy for binary cache / S3.

See the README and architecture docs for more details. There is also a HackerNews discussion.

@hinshun
Copy link
Contributor Author

hinshun commented Feb 14, 2024

@brandond We have it working end-to-end now in a qemu VM with k3s built with our patches. It did require also forking github.com/k3s-io/containerd because k3s's containerd fork doesn't have the patches we contributed to containerd.

For example in k3s master, we're currently pinned on v1.7.11-k3s2:
https://github.com/k3s-io/k3s/blob/master/go.mod#L9

Since this can happen asynchronously, wanted to bring up early that we need these two cherry picks into k3s-io/containerd branches maintained for k3s:

@brandond
Copy link
Member

Are those patches in containerd v1.7.13? If so we can look at updating to that for the March releases.

@hinshun hinshun force-pushed the feature/nix-snapshotter branch 4 times, most recently from e708b47 to 337292a Compare February 14, 2024 23:26
@hinshun
Copy link
Contributor Author

hinshun commented Feb 14, 2024

Are those patches in containerd v1.7.13? If so we can look at updating to that for the March releases.

It's been merged for a while now but looks like it's only been cherry-picked to v2.0.0-beta.0 and onwards.

@brandond
Copy link
Member

Ah. I'm not sure when exactly we'll go to 2.0.

@hinshun
Copy link
Contributor Author

hinshun commented Feb 15, 2024

Ah. I'm not sure when exactly we'll go to 2.0.

I've started an issue in containerd to see if we can get it cherry-picked to v1.7.x series: containerd/containerd#9826

In the mean time, we can maintain our patches for NixOS.

@hinshun
Copy link
Contributor Author

hinshun commented Feb 23, 2024

@brandond I've managed to cherry-pick those changes into the next containerd 1.7 release! Once that is live & k3s picks up that release, would you be open to accepting this PR? I can look into adding integration tests similar to stargz-snapshotter.

See: containerd/containerd#9826 (comment)

@hinshun
Copy link
Contributor Author

hinshun commented Mar 15, 2024

@brandond The patches are in containerd v1.7.14. Could you let me know when v1.7.14-k3s2 is ready?

@dereknola
Copy link
Member

@hinshun Are you still working on this at all?

@hinshun
Copy link
Contributor Author

hinshun commented Oct 22, 2024

Hi @dereknola! It's been a while but it looks like k3s finally picked up my patches to containerd so this work is now unblocked. Will be working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants