diff --git a/README.md b/README.md index d176d03b..a40265f0 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ Note: The images are not pre-configured and you must follow the steps in setup f ## Setup -To use `lxcri` as OCI runtime in `cri-o` see [install.md](doc/install.md) +To use `lxcri` as OCI runtime in `cri-o` see [setup.md](doc/setup.md) ## API Usage diff --git a/cmd/lxcri-hook/README.md b/cmd/lxcri-hook/README.md deleted file mode 100644 index d1df1c79..00000000 --- a/cmd/lxcri-hook/README.md +++ /dev/null @@ -1,128 +0,0 @@ -# Hooks - -* see https://github.com/opencontainers/runtime-spec/blob/master/config.md - -## Notes - -The OCI hooks wrapper will work in plain lxc containers because the -OCI state (state.json, hooks.json, config.json) is not available. - -It's perfectly reasonable to run hooks directly from lxcri cli - -OCI state must be bind mounted into the container. - -## CreateRuntime - -NOTE underspecified -conditions: mount namespace have been created, mount operations performed (all ?) - -* when: before pivot_root, after namespace creation -* path: runtime namespace -* exec: runtime namespace - -* maps to: lxc.hook.pre-start ? (mounts are not created) -* lxc.hook.pre-mount ? (container's fs namespace == mount namespace ?) - -## CreateContainer - -* when: before pivot_root, after mount namespace setup -* path: runtime namespace -* exec: container namespace - -* maps to: lxc.hook.mount - -## StartContainer - -* when: before lxcri-init execs, after mounts are complete -* path: container namespace -* exec: container namespace - -* maps to: lxc.hook.start - -Run from `lxcri-init` the same way the user process is executed? - -Bind mount hook launcher into container. -Create folder with environ/cmdline files for each hook. - -## PostStart - -* when: after syncfifo is unblocked -* path: runtime namespace -* exec: runtime namespace - -* maps to: no LXC hook - -Usually this is done manually after calling `lxc-start` -Run directory after unblocking the syncfifo in Runtime.Start -Set LXC_ environment variables ? - -## PostStop - -* when: after container delete / before delete returns -* path: runtime namespace -* exec: runtime namespace - -* maps to: lxc.hook.destroy - -Run directly in Runtime.Delete - - -### Solution 1 - -Add a cli command `hooks` with the container name and the hook as argument. - -* Bad: hooks should not be accessible through the CLI because they - should only be executed within defined runtime states. - (simply hide the command from the help output ?) - -* Bad: lxcri with all libraries must be available in the container for - CreateContainer and StartContainer hooks. - -### Idea 2 - -* Update the container state in runtime commands and serialize it to the runtime directory. - -Extend / Update the state from the LXC hook environment variables. -Create a single C binary that executes the hooks from the lxc hook. - -Serialize hooks into a format that can be consumed by hooks -and started from 'liblxc' using a simple static C binary, -similar to `lxcri-init`. - -Use the same mechanism `lxcri-init` uses to exec the hook -processes. - -* Bind mount the hook directories, for hooks running in the -container namespace into the container. -e.g /.lxcri/hooks - -lxc.hook.mount = lxcri-hook create-runtime - - -e.g create - -{runtime_dir}/state.json - -{runtime_dir}/hooks/create_runtime/1/cmdline -{runtime_dir}/hooks/create_runtime/1/environ - -{runtime_dir}/hooks/create_runtime/2/cmdline -{runtime_dir}/hooks/create_runtime/2/environ - -... - -{runtime_dir}/hooks/create-container/1/cmdline -{runtime_dir}/hooks/create-container/2/environ - - - -Pass state.json to executed process. - - -c tool can iterate over contents in the hook directory -and load and execute process and cmline -for each subfolder. - -* can be implemented as go binary and as C binary .... - -* timeout: set as additional environment variable e.g OCI_HOOK_TIMEOUT diff --git a/doc/cli.md b/doc/cli.md new file mode 100644 index 00000000..b6a4b3f9 --- /dev/null +++ b/doc/cli.md @@ -0,0 +1,111 @@ +## Glossary + +* `runtime` the lxcri binary and the command set that implement the [OCI runtime spec](https://github.com/opencontainers/runtime-spec/releases/download/v1.0.2/oci-runtime-spec-v1.0.2.html) +* `container process` the process that starts and runs the container using liblxc (lxcri-start) +* `container config` the LXC config file +* `bundle config` the lxcri container state (bundle path, pidfile ....) +* `runtime spec` the OCI runtime spec from the bundle + +## Setup + +The runtime binary implements flags that are required by the `OCI runtime spec`,
+and flags that are runtime specific (timeouts, hooks, logging ...). + +Most of the runtime specific flags have corresponding environment variables. See `lxcri --help`.
+The runtime evaluates the flag value in the following order (lower order takes precedence). + +1. cmdline flag from process arguments (overwrites process environment) +2. process environment variable (overwrites environment file) +3. environment file (overwrites cmdline flag default) +4. cmdline flag default + +### Environment variables + +Currently you have to compile to environment file yourself.
+To list all available variables: + +``` +grep EnvVars cmd/cli.go | grep -o LXCRI_[A-Za-z_]* | xargs -n1 -I'{}' echo "#{}=" +``` + +### Environment file + +The default path to the environment file is `/etc/defaults/lxcri`.
+It is loaded on every start of the `lxcri` binary, so changes take immediate effect.
+Empty lines and those commented with a leading *#* are ignored.
+ +A malformed environment will let the next runtime call fail.
+In production it's recommended that you replace the environment file atomically.
+ +E.g the environment file `/etc/default/lxcri` could look like this: + +```sh +LXCRI_LOG_LEVEL=debug +LXCRI_CONTAINER_LOG_LEVEL=debug +#LXCRI_LOG_FILE= +#LXCRI_LOG_TIMESTAMP= +#LXCRI_MONITOR_CGROUP= +#LXCRI_LIBEXEC= +#LXCRI_APPARMOR= +#LXCRI_CAPABILITIES= +#LXCRI_CGROUP_DEVICES= +#LXCRI_SECCOMP= +#LXCRI_CREATE_TIMEOUT= +#LXCRI_CREATE_HOOK=/usr/local/bin/lxcri-backup.sh +#LXCRI_CREATE_HOOK_TIMEOUT= +#LXCRI_START_TIMEOUT= +#LXCRI_KILL_TIMEOUT= +#LXCRI_DELETE_TIMEOUT= +``` + +### Runtime (security) features + +All supported runtime security features are enabled by default.
+The following runtime (security) features can optionally be disabled.
+Details see `lxcri --help` + +* apparmor +* capabilities +* cgroup-devices +* seccomp + +### Logging + +There is only a single log file for runtime and container process log output.
+The log-level for the runtime and the container process can be set independently. + +* containers are ephemeral, but the log file should not be +* a single logfile is easy to rotate and monitor +* a single logfile is easy to tail (watch for errors / events ...) +* robust implementation is easy + +#### Log Filtering + +Runtime log lines are written in JSON using [zerolog](https://github.com/rs/zerolog).
+The log file can be easily filtered with [jq](https://stedolan.github.io/jq/).
+For filtering with `jq` you must strip the container process logs with `grep -v '^lxc'`
+ +E.g Filter show only errors and warnings for runtime `create` command: + +```sh + grep -v '^lxc ' /var/log/lxcri.log |\ + jq -c 'select(.cmd == "create" and ( .l == "error or .l == "warn")' +``` + +#### Runtime log fields + +Fields that are always present: + +* `l` log level +* `m` log message +* `c` caller (source file and line number) +* `cid` container ID +* `cmd` runtime command +* `t` timestamp in UTC (format matches container process output) + +### Debugging + +Apart from the logfile following resources are useful: + +* Systemd journal for cri-o and kubelet services +* `coredumpctl` if runtime or container process segfaults. diff --git a/doc/install.md b/doc/install.md deleted file mode 100644 index c978bd34..00000000 --- a/doc/install.md +++ /dev/null @@ -1,149 +0,0 @@ -## cgroups - -Enable cgroupv2 unified hierarchy manually: - -``` -mount -t cgroup2 none /sys/fs/cgroup -``` - -or permanent via kernel cmdline params: - - ``` - systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all - ``` - -## build dependencies - -Install the build dependencies which are required to build the runtime and runtime dependencies. - -### debian - -```sh -# liblxc / conmon build dependencies -apt-get install build-essential libtool automake pkg-config \ -libseccomp-dev libapparmor-dev libbtrfs-dev \ -libdevmapper-dev libcap-dev libc6-dev libglib2.0-dev -# k8s dependencies, tools -apt-get install jq ebtables iptables conntrack -``` - -### arch linux - -```sh -# liblxc / conmon build dependencies -pacman -Sy base-devel apparmor libseccomp libpcap btrfs-progs -# k8s dependencies -pacman -Sy conntrack-tools ebtables jq -``` - -## runtime dependencies - -* [lxc](https://github.com/lxc/lxc.git) >= b5daeddc5afce1cad4915aef3e71fdfe0f428709 -* [conmon/pinns](https://github.com/containers/conmon.git) v2.0.22 -* [cri-o](https://github.com/cri-o/cri-o.git) release-1.20 - -By default everything is installed to `/usr/local` - -### lxc (liblxc) - -```sh -git clone https://github.com/lxc/lxc.git -cd lxc -./autogen.sh -./configure --enable-bash=no --enable-seccomp=yes \ - --enable-capabilities=yes --enable-apparmor=yes -make install - -git describe --tags > /usr/local/lib/liblxc.version.txt -echo /usr/local/lib > /etc/ld.so.conf.d/local.conf -ldconfig -``` - -### lxcri - -``` -make install -``` - -The installation prefix environment variable is set to `PREFIX=/usr/local` by default.
-The library source path for `pkg-config` is set to `$PREFIX/lib/pkg-config` by default.
-You can change that by setting the `PKG_CONFIG_PATH` environment variable.
- -E.g to install binaries in `/opt/bin` but use liblxc from `/usr/lib`: - - PREFIX=/opt PKG_CONFIG_PATH=/usr/lib/pkgconfig make install - -Keep in mind that you have to change the `INSTALL_PREFIX` in the crio install script below. - -### conmon - -```sh -git clone https://github.com/containers/conmon.git -cd conmon -git reset --hard v2.0.22 -make clean -make install -``` - -### cri-o - -```sh -#!/bin/sh -git clone https://github.com/cri-o/cri-o.git -cd cri-o -git reset --hard origin/release-1.20 -make install - -PREFIX=/usr/local -CRIO_LXC_ROOT=/run/lxcri - -# environment for `crio config` -export CONTAINER_CONMON=${PREFIX}/bin/conmon -export CONTAINER_PINNS_PATH=${PREFIX}/bin/pinns -export CONTAINER_DEFAULT_RUNTIME=lxcri -export CONTAINER_RUNTIMES=lxcri:${PREFIX}/bin/lxcri:$CRIO_LXC_ROOT - -crio config > /etc/crio/crio.conf -``` - -#### cgroupv2 ebpf - -Modify systemd service file to run with full privileges.
-This is required for the runtime to set cgroupv2 device controller eBPF.
-See https://github.com/cri-o/cri-o/pull/4272 - -``` -sed -i 's/ExecStart=\//ExecStart=+\//' /usr/local/lib/systemd/system/crio.service -systemctl daemon-reload -systemctl start crio -``` - -#### storage configuration - -If you're using `overlay` as storage driver cri-o may complain that it is not using `native diff` mode.
-Update `/etc/containers/storage.conf` to fix this. - -``` -# see https://github.com/containers/storage/blob/v1.20.2/docs/containers-storage.conf.5.md -[storage] -driver = "overlay" - -[storage.options.overlay] -# see https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt, `modinfo overlay` -# [ 8270.526807] overlayfs: conflicting options: metacopy=on,redirect_dir=off -# NOTE: metacopy can only be enabled when redirect_dir is enabled -# NOTE: storage driver name must be set or mountopt are not evaluated, -# even when the driver is the default driver --> BUG ? -mountopt = "nodev,redirect_dir=off,metacopy=off" -``` - -#### HTTP proxy - -If you need a HTTP proxy for internet access you may have to set the proxy environment variables in `/etc/default/crio` -for crio-o to be able to fetch images from remote repositories. - -``` -http_proxy="http://myproxy:3128" -https_proxy="http://myproxy:3128" -no_proxy="10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,127.0.0.0/8,127.0.0.1,localhost" -``` diff --git a/doc/setup.md b/doc/setup.md index b6a4b3f9..17bb6e9d 100644 --- a/doc/setup.md +++ b/doc/setup.md @@ -1,111 +1,72 @@ -## Glossary +# Setup -* `runtime` the lxcri binary and the command set that implement the [OCI runtime spec](https://github.com/opencontainers/runtime-spec/releases/download/v1.0.2/oci-runtime-spec-v1.0.2.html) -* `container process` the process that starts and runs the container using liblxc (lxcri-start) -* `container config` the LXC config file -* `bundle config` the lxcri container state (bundle path, pidfile ....) -* `runtime spec` the OCI runtime spec from the bundle +NOTE: This documentation is not yet complete and will be updated. -## Setup +## cgroups -The runtime binary implements flags that are required by the `OCI runtime spec`,
-and flags that are runtime specific (timeouts, hooks, logging ...). +Enable cgroupv2 unified hierarchy manually: -Most of the runtime specific flags have corresponding environment variables. See `lxcri --help`.
-The runtime evaluates the flag value in the following order (lower order takes precedence). +`mount -t cgroup2 none /sys/fs/cgroup` -1. cmdline flag from process arguments (overwrites process environment) -2. process environment variable (overwrites environment file) -3. environment file (overwrites cmdline flag default) -4. cmdline flag default +or permanent via kernel cmdline params: -### Environment variables +`systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all` -Currently you have to compile to environment file yourself.
-To list all available variables: +## cri-o ``` -grep EnvVars cmd/cli.go | grep -o LXCRI_[A-Za-z_]* | xargs -n1 -I'{}' echo "#{}=" -``` - -### Environment file - -The default path to the environment file is `/etc/defaults/lxcri`.
-It is loaded on every start of the `lxcri` binary, so changes take immediate effect.
-Empty lines and those commented with a leading *#* are ignored.
- -A malformed environment will let the next runtime call fail.
-In production it's recommended that you replace the environment file atomically.
- -E.g the environment file `/etc/default/lxcri` could look like this: - -```sh -LXCRI_LOG_LEVEL=debug -LXCRI_CONTAINER_LOG_LEVEL=debug -#LXCRI_LOG_FILE= -#LXCRI_LOG_TIMESTAMP= -#LXCRI_MONITOR_CGROUP= -#LXCRI_LIBEXEC= -#LXCRI_APPARMOR= -#LXCRI_CAPABILITIES= -#LXCRI_CGROUP_DEVICES= -#LXCRI_SECCOMP= -#LXCRI_CREATE_TIMEOUT= -#LXCRI_CREATE_HOOK=/usr/local/bin/lxcri-backup.sh -#LXCRI_CREATE_HOOK_TIMEOUT= -#LXCRI_START_TIMEOUT= -#LXCRI_KILL_TIMEOUT= -#LXCRI_DELETE_TIMEOUT= -``` - -### Runtime (security) features +PREFIX=/usr/local +LXCRI_ROOT=/run/lxcri -All supported runtime security features are enabled by default.
-The following runtime (security) features can optionally be disabled.
-Details see `lxcri --help` +# environment for `crio config` +export CONTAINER_CONMON=${PREFIX}/bin/conmon +export CONTAINER_PINNS_PATH=${PREFIX}/bin/pinns +export CONTAINER_DEFAULT_RUNTIME=lxcri +export CONTAINER_RUNTIMES=lxcri:${PREFIX}/bin/lxcri:$LXCRI_ROOT -* apparmor -* capabilities -* cgroup-devices -* seccomp - -### Logging +crio config > /etc/crio/crio.conf +``` -There is only a single log file for runtime and container process log output.
-The log-level for the runtime and the container process can be set independently. +### cgroupv2 ebpf -* containers are ephemeral, but the log file should not be -* a single logfile is easy to rotate and monitor -* a single logfile is easy to tail (watch for errors / events ...) -* robust implementation is easy +Modify systemd service file to run with full privileges.
+This is required for the runtime to set cgroupv2 device controller eBPF.
+See https://github.com/cri-o/cri-o/pull/4272 -#### Log Filtering +``` +sed -i 's/ExecStart=\//ExecStart=+\//' /usr/local/lib/systemd/system/crio.service +systemctl daemon-reload +systemctl start crio +``` -Runtime log lines are written in JSON using [zerolog](https://github.com/rs/zerolog).
-The log file can be easily filtered with [jq](https://stedolan.github.io/jq/).
-For filtering with `jq` you must strip the container process logs with `grep -v '^lxc'`
+### HTTP proxy -E.g Filter show only errors and warnings for runtime `create` command: +If you need a HTTP proxy for internet access you may have to set the proxy environment variables in `/etc/default/crio` +for crio-o to be able to fetch images from remote repositories. -```sh - grep -v '^lxc ' /var/log/lxcri.log |\ - jq -c 'select(.cmd == "create" and ( .l == "error or .l == "warn")' +``` +http_proxy="http://myproxy:3128" +https_proxy="http://myproxy:3128" +no_proxy="10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,127.0.0.0/8,127.0.0.1,localhost" ``` -#### Runtime log fields - -Fields that are always present: - -* `l` log level -* `m` log message -* `c` caller (source file and line number) -* `cid` container ID -* `cmd` runtime command -* `t` timestamp in UTC (format matches container process output) +## /etc/containers -### Debugging +### storage -Apart from the logfile following resources are useful: +If you're using `overlay` as storage driver cri-o may complain that it is not using `native diff` mode.
+Update `/etc/containers/storage.conf` to fix this. -* Systemd journal for cri-o and kubelet services -* `coredumpctl` if runtime or container process segfaults. +``` +# see https://github.com/containers/storage/blob/v1.20.2/docs/containers-storage.conf.5.md +[storage] +driver = "overlay" + +[storage.options.overlay] +# see https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt, `modinfo overlay` +# [ 8270.526807] overlayfs: conflicting options: metacopy=on,redirect_dir=off +# NOTE: metacopy can only be enabled when redirect_dir is enabled +# NOTE: storage driver name must be set or mountopt are not evaluated, +# even when the driver is the default driver --> BUG ? +mountopt = "nodev,redirect_dir=off,metacopy=off" +```