-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/k8sattributes] Implement container.image.id
for k8sattributes processor
#32314
Conversation
This is to be consistent with what k8s does in practice.
This comment was marked as outdated.
This comment was marked as outdated.
container.image.id
for k8sattributes processorcontainer.image.id
for k8sattributes processor
Should address uniformity of versions within k8sattributesprocessor and any semconv version ugprade to a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment in order to ensure that we are aligned here with the SemConv spec.
Originally, when we dealt with those container.image.*
attributes it was a bit of pain :) so now we hit that again from a different angle though. However I believe (and hope :)) that SemConv is quite accurate on that part so we can rely on this to clarify any remaining questions.
cc-ing @lmolkova and @marcsanmi who were involved in these conversations back then.
ref:
@@ -94,6 +94,10 @@ resource_attributes: | |||
description: Container ID. Usually a UUID, as for example used to identify Docker containers. The UUID might be abbreviated. Requires k8s.container.restart_count. | |||
type: string | |||
enabled: false | |||
container.image.id: | |||
description: Container image digest. Requires container.id or k8s.container.name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we align this description with the SemConv spec https://github.com/open-telemetry/semantic-conventions/blob/768a23c3627e4bd96146d9f713db214bee7aecf3/model/registry/container.yaml#L47?
wantAttrs: map[string]string{ | ||
kube.K8sIPLabelName: "1.1.1.1", | ||
conventions.AttributeK8SContainerName: "app", | ||
containerImageID: "test/app@sha256:deadbeef02", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that based on the SemConv spec we have agreed on at https://github.com/open-telemetry/semantic-conventions/blob/main/model/registry/container.yaml, the container.image.id
should only be the sha256:deadbeef02
part, which is the container ID that the runtime gives.
If you want to use the full repository digest then you would need to use the container.image.repo_digests
attribute.
The CRI protocol definition distinguish these 2 at https://github.com/kubernetes/cri-api/blob/c75ef5b473bbe2d0a4fc92f82235efd665ea8e9f/pkg/apis/runtime/v1/api.proto#L1237-L1238 but as far as I can see on k8s we can see both:
$ kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .status.containerStatuses[*]}{.imageID}{", "}{end}{end}' |\
sort
coredns-5d78c9869d-29w9c: sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc,
coredns-5d78c9869d-xkcll: sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc,
etcd-kind-control-plane: sha256:86b6af7dd652c1b38118be1c338e9354b33469e69a218f7e290a0ca5304ad681,
kindnet-885s7: sha256:b0b1fa0f58c6e932b7f20bf208b2841317a1e8c88cc51b18358310bbd8ec95da,
kube-apiserver-kind-control-plane: docker.io/library/import-2023-06-15@sha256:0202953c0b15043ca535e81d97f7062240ae66ea044b24378370d6e577782762,
kube-controller-manager-kind-control-plane: docker.io/library/import-2023-06-15@sha256:bdbeb95d8a0820cbc385e44f75ed25799ac8961e952ded26aa2a09b3377dfee7,
kube-proxy-gr6w4: docker.io/library/import-2023-06-15@sha256:ce2145a147b3f1fc440ba15eaa91b879ba9cbf929c8dd8f3190868f4373f2183,
kube-scheduler-kind-control-plane: docker.io/library/import-2023-06-15@sha256:9d6f903c0d4bf3b145c7bbc68727251ca1abf98aed7f8d2acb9f6a10ac81e8c2,
It seems that Kubernetes will either use the repoDigests
if exists otherwise it will use the id
to populate the .status.containerStatuses[*]}{.imageID}
.
On the other hand, from inside a k8s node we can interact directly with the CRI api and inspect the container images directly and verify the differences:
$ crictl inspecti sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62
{
"status": {
"id": "sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62",
"repoTags": [
"registry.k8s.io/metrics-server/metrics-server:v0.7.1"
],
"repoDigests": [
"registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a"
],
"size": "19478031",
...
}
So I think it's more accurate if we either populate both container.image.id
and container.image.repo_digests
conditionally, or we would need to introduce k8s.container.image.id
which would map exactly whatever the k8s API returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally when I first coded this, I did strip off any prefix and just leave the sha256:<digest>
portion, and I think this is what most people do want since they can get the image name by using container.image.name
.
But I read the container.image.id
"notes" in the semantic conventions which I interpreted to indicate it's acceptable that the value may vary based on the runtime, and it made the code simpler to just propagate the value from the kube api's PodStatus.ContainerStatuses[].ImageID
.
K8s defines a link to the container registry repository with digest
"imageID": "registry.azurecr.io /namespace/service/dockerfile@sha256:bdeabd40c3a8a492eaf9e8e44d0ebbb84bac7ee25ac0cf8a7159d25f62555625"
.
The ID is assigned by the container runtime and can vary in different environments. Consider usingoci.manifest.digest
if it is important to identify the same image in different environments/runtimes.
I wasn't involved in any of the semantic convention discussions when these attributes were included though, so I'm happy to try and implement whichever way the semantic convention team thinks is best in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K8s defines a link to the container registry repository with digest "imageID": "registry.azurecr.io /namespace/service/dockerfile@sha256:bdeabd40c3a8a492eaf9e8e44d0ebbb84bac7ee25ac0cf8a7159d25f62555625".
I think this is not 100% accurate since it's not always the case. It seems that when the repoDigests
of the container image is empty then k8s will use the id
. So we might need to "fix" the wording there in SemConv.
If we trim the full repoDigest
to only keep the sha256:<digest>
part then the remaining is nothing meaningful and definitely not the image's id that the runtime provides.
For example I see the following:
$ kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{":\t"}{range .status.containerStatuses[*]}{.imageID}{", "}{end}{end}' |\
sort
metrics-server-6b4d6cb49f-pfzzs: registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a
This means that the imageID
is the repoDigest
.
Indeed:
$ crictl inspecti -q registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a
{
"status": {
"id": "sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62",
"repoTags": [
"registry.k8s.io/metrics-server/metrics-server:v0.7.1"
],
"repoDigests": [
"registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a"
],
"size": "19478031",
"uid": {
"value": "65534"
},
"username": "",
"spec": null,
"pinned": false
}
}
So if I trim the registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a
:
$ crictl inspecti -q sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a
FATA[0000] no such image "sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a" present
But if I use the actual image id:
$ crictl inspecti -q sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62
{
"status": {
"id": "sha256:a24c7c057ec8730aaa152f77366454835a46dc699fcf243698a622788fd48d62",
"repoTags": [
"registry.k8s.io/metrics-server/metrics-server:v0.7.1"
],
"repoDigests": [
"registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a"
],
"size": "19478031",
"uid": {
"value": "65534"
},
"username": "",
"spec": null,
"pinned": false
}
}
This tells me that Kubernetes either uses the image.id
or the image.repoDigests[0]
of the runtime. I haven't checked its implementation though to verify this, maybe it worth checking that.
I believe we should be consistent with the runtime and populate the container.image.repo_digests
if the imageID
is a repo digest or use the container.image.id
if the imageID
is not repo digest.
Otherwise we need to use a k8s specific attribute called k8s.container.image.id
for the imageID
because container.image.*
is tied to the container runtime and we should preserve this 🤔 ?
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
I want to reopen this as this is still something we'd like to be able to obtain from the k8sattributes processor in the OpenTelemetry Collector. What we want to do is to be able to uniquely identify the images (that in our case, will always be sourced from an OCI or docker registry) that make up a telemetry emitting pod - i.e. we can't rely just on the image tags, since tags are mutable. Based on what @ChrsMark said:
I have looked into the k8s code, CRI spec and a couple of container runtimes and believe the following description is accurate for k8s 1.29, although it would be good to have this confirmed by someone more familiar with the k8s API.
Note: In k8s-1.30 and later, there is an additional
As @ChrsMark said:
I think one way to proceed is to also use the github.com/distribution/reference package to attempt to parse the output of |
Thank's @evantorrie for looking into this! Should we first clarify and fix anything missing (i.e. the inconsistency mentioned at #32314 (comment)) from the Sem Conv side? Then if the implementation is aligned with Sem Conv we should be fine. |
container.image.id
for k8sattributes processorcontainer.image.repo_digests
for k8sattributes processor
container.image.repo_digests
for k8sattributes processorcontainer.image.id
for k8sattributes processor
@ChrsMark After thinking about this more (and also writing an implementation), I think I'm going to close this PR and start a new one, leaving this one here for reference. Secondly, yes, I do think it would be good to clarify the comment around
As evidenced by the exploration above, the containerd implementation of the k8s CRI interface populates this If it can't find a repo digest entry (e.g. the image was loaded directly into the image store without being pulled from a remote registry), then the imageID field populated in the k8s response will just be the containerd's local Container.Metadata.ImageRef. My plan is to create a new feature request to populate |
Makes sense! Thank's @evantorrie for looking into this thoroughly! I filed open-telemetry/semantic-conventions#1236 to clarify the conventions part. |
Description:
Implements #32152
Adds
container.image.id
to list of Kubernetes attributes detected byk8sattributes
processor.Populates this attribute with the value returned from k8s pod status ImageID field.
Link to tracking Issue: #32152
Testing:
ImageID
field from container status.container.image.id
in rules for extraction.container.image.id
in the e2e tests for k8sattributes processor.Documentation: Added
container.image.id
as an example of attributes that can be extracted from pod/container