-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pod mapping via cgroup ids #2776
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
29bb91a
to
937ed26
Compare
ef46923
to
0152233
Compare
0152233
to
8f031aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a nice improvement. I tried my best to review it.
On the nit side, I spotted many places where you use ID
or Id
, maybe that would be could to unify and use Golang convention ID
. I saw some typos in the comments but not sure you want to spend the time fixing those in all those commits and didn't want to be annoying pointing them since we totally get the point anyway.
8f031aa
to
3268367
Compare
Thanks for the review @mtardy. Pushed a new version. |
Generic comment it would be best to keep patch max patch count around 10 or so this is getting rather large. Just for future. |
Signed-off-by: Kornilios Kourtis <[email protected]>
Return an error if a callback in RuntimeHook fails. This might lead to a container failing, but this is the desired behavior for many use-cases. For other use-cases, the hook (client) can be contributed accordingly so that it does not fail allowing the runtime to create the container. Signed-off-by: Kornilios Kourtis <[email protected]>
containerNameFromAnnotations uses specific annotations for containerd and cri-o to get the container name. These annotations are used for the NRI internal implementation for containerd and cri-o so add some code references as comments. Signed-off-by: Kornilios Kourtis <[email protected]>
Add containerID, podName, podID, and podNamespace to CreateContainer message. Signed-off-by: Kornilios Kourtis <[email protected]>
Signed-off-by: Kornilios Kourtis <[email protected]>
Add a function for the container id from the argument of the CreateContainer grpc call. Improvements will follow. This function is going to be used by a new hook that will be added in subsequent patches. Signed-off-by: Kornilios Kourtis <[email protected]>
crio paths add a prefix to the container path. Remove it when we return the container id. Add a test. Signed-off-by: Kornilios Kourtis <[email protected]>
Previous patches added a ContainerID field to the CreateContainerArg gRPC request. Use it in ContainerID() if it exists. Signed-off-by: Kornilios Kourtis <[email protected]>
Add a function for getting the pod from the argument of the CreateContainer grpc call. This function is going to be used by a new hook that will be added in subsequent patches. Signed-off-by: Kornilios Kourtis <[email protected]>
Cgroup id and Pod require some non-trivial steps to compute. Subsequent patches will have an additional hook that uses these methods. Cache the results so that we only use them once. Signed-off-by: Kornilios Kourtis <[email protected]>
Move some pod helpers code from policyfilter into a different package so that they can be used in subsequent patches. Signed-off-by: Kornilios Kourtis <[email protected]>
3268367
to
0a450d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. It's a little scary that we haven't tested this on cgroupv1 but I think it makes sense to mark as beta with a note in the documentation until we get the proper testing infra in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick review as I didn't have time to get all of it.
Kornilios what you mean by "talking to the CRI to update missing mappings (this is useful for pods started before tetragon)" ? and curious why we can't just do that from /proc? |
0a450d6
to
eb2ae48
Compare
I mean that we are talking to the CRI (cri-o or containerd) socket to get information for existing containers. This information includes the cgroup path of the container, which we use to find the cgroup id. I think it should be possible to scan the filesystem and find the cgroup paths without talking to the CRI (indeed the policyfilter code does this), but the CRI is the source of truth for this and using it seems like the best approach to me. |
Add a function to get the cgroup id from a PID. This is meant to be used in /proc scanning, to retrieve the cgroup id of a process. The function was tested for cgroup v2, but not for v1. Signed-off-by: Kornilios Kourtis <[email protected]>
Create a new function findPod for the basic functionality. No functional changes here, just moving code around to make the next patch clearer. Signed-off-by: Kornilios Kourtis <[email protected]>
Static pods are started and managed directly by kubelet and are mirrored to the API server. Hence, the pod UID in the node is different than the pod UID in the API server. As a result, if one of the static pods is restarted while the hook is running, we cannot find the corresponding pod in the API server by the id. Instead, we need to look it up using the "kubernetes.io/config.hash" and "kubernetes.io/config.mirror" annoations. Add a FindMirrorPod function in the watcher and use it to find the mirror pod UID for static pods in rthooks. Signed-off-by: Kornilios Kourtis <[email protected]>
policyfilter included code for enabling debugging only for policyfilter code. Move this code in pkg/logger so that it can be used in other packages as well. Signed-off-by: Kornilios Kourtis <[email protected]>
Pod association is attaching pod information to events. Previously, pod association happened using the cgroup name under the assumption that each container is executed directly on a cgroup with a name that contains the container id of the k8s pod. Hence, by having events (bpf and proc) include the cgroup name, it was possible to retrieve the container id and from that the pod. This approach, however, has some issues. One example is that crun, an OCI runner used by cri-o, may (and by default will) execute containers in a cgroup where the name does include the container id. For example: kubepods-besteffort.slice/kubepods-besteffort-pod42b7841b_bb27_49e2_81bc_3d3a372dd231.slice/crio-6727930f5d4e9b799d58732af6882826591eb476f680b6499d8919d1ec515eee.scope/container/ In this case, the pod association scheme based on the cgroup name does not work. To address the above and other issues, this patch introduces cgidmap, which allows doing pod association by cgroup id (rather than cgroup name). The general idea is that: - events (bpf and proc) will include the cgroup id, which can be used to query the map for the container id. Once we have the container id, pod association can proceed as previously. - mappings of cgroup ids to container ids will be added by the runtime hooks, before a container starts. - changes in pods (pod removal, container removal) will be handled via the K8s API server pod watcher. One issue left for subsequent patches is dealing with cases where the runtime hook is not running, which is required for pod association for pods started before tetragon. Signed-off-by: Kornilios Kourtis <[email protected]>
Add CRI client code. This will be used in subsequent patches. Signed-off-by: Kornilios Kourtis <[email protected]>
Talking to the CRI from tetra is useful for development and troubleshooting. Add a simple version command for now. More will follow. Signed-off-by: Kornilios Kourtis <[email protected]>
We use https://github.com/tidwall/gjson to get the field, and, also, add some code to parse the cgroup path if it's in a weird systemd format. Signed-off-by: Kornilios Kourtis <[email protected]>
Previous commit added code to fetch the cgroup path from the CRI. This commit adds a correpsonding command to the tetra cli. Example: ./tetra cri cgroup_path 17edc91323fec /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfe54825188b04dea25220ce286cc3e2f.slice/crio-17edc91323fec4e5eda212c4c5c6f7535dbf7ea8e88f2909c8938f51f4eb48a8.scope Signed-off-by: Kornilios Kourtis <[email protected]>
This commit adds code to query the CRI to resolve cgroup ids when the value is missing. We use a simple bounded LIFO and a goroutine that performs the connection to the CRI, decodes the result, and updates the cgidmap. Signed-off-by: Kornilios Kourtis <[email protected]>
This commit adds functionality (disabled by default) to resolve Pods with cgroup ids, instead of cgroup names. The motivation is to have more robust pod association and deal with idiosyncrasies of certain runtimes (specifically, crun; See next patch). To enable this feature users need to set: --enable-cgidmap --enable-cri It is also highly recommended to use runtime hooks so that the cgroup id mappings are set before the container starts. Signed-off-by: Kornilios Kourtis <[email protected]>
crun, an OCI container runtime used by cri-o breaks pod association for tetragon by using placing processes in a cgroup below the cgroup specified by the OCI spec: https://github.com/containers/crun/blob/main/crun.1.md#runocisystemdsubgroupsubgroup. With the introduction of cgidmap, this commit can finally deal with this issue by scanning the cgroup directory for children directories and, if it finds one, use the cgroup id of the child. A better solution would be to allow for multiple cgroup ids for each container, but this is left as a followup. The commit includes a script for testing this issue using minikube. Becaues minikube uses an older version of crun, we need to install it. The steps for reproducing this are: minikube start --driver=kvm2 --container-runtime=crio --force-systemd=true ./scripts/minikube-install-crun.sh Running tetragon without cgidmap, we observe events without pod association: 🚀 process minikube /usr/bin/ls 💥 exit minikube /usr/bin/ls 0 By installing the runtime hooks: ./scripts/minikube-install-hook.sh And runing tetragon with cgidmap (and nri) using --enable-cri --enable-cgidmap, we observe pod association for both old and new pods: 🚀 process default/test /usr/bin/ls 💥 exit default/test /usr/bin/ls 0 Signed-off-by: Kornilios Kourtis <[email protected]>
Both findMirrorPod and findPod functions do the same retry loop. Use a helper retry function instead. Signed-off-by: Kornilios Kourtis <[email protected]>
eb2ae48
to
ebd29de
Compare
Merging this. Left a couple of followups in the PR description based on the comments. Thanks everyone! |
This PR implements pod association with cgroup ids instead of cgroup names.
It achieves this by:
cgidmap
)The immediate concern that this PR tries to address is fixing pod association on recent1 versions of crun (an OCI container used by cri-o) not working. See #2639.
In addition to that, it offers a more reliable way to associate pods since it is based on authoritative information provided by runtime hooks and the CRI.
Followups:
Footnotes
https://github.com/containers/crun/blob/main/NEWS#L565-L567 ↩