Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider collecting (by-default) all underlying host's processes in K8s #5256

Open
ChrsMark opened this issue Aug 5, 2024 · 12 comments
Open
Labels
Team:Obs-DC Label for the Data Collection team

Comments

@ChrsMark
Copy link
Member

ChrsMark commented Aug 5, 2024

In Agent standlone on K8s the process datastream is enabled by default:

However it does not collect the underlying host's processes.

Would that make sense to collect the underlying system's processes (and possibly metrics) instead of those of the Agent container's scope?

I tried the following:

- id: system/metrics-system.process-52c2cd5b-0cff-4060-b0ad-a2f533124165
  data_stream:
    type: metrics
    dataset: system.process
  metricsets:
    - process
  period: 10s
  hostfs: "/hostfs"
  process.include_top_n.by_cpu: 5
  process.include_top_n.by_memory: 5
  process.cmdline.cache.enabled: true
  process.cgroups.enabled: false
  process.include_cpu_ticks: false
  processes:
    - .*

(note the hostfs: "/hostfs") part.
To get the desired result:

k8ssystemprocesses

After the addition of the hostfs: "/hostfs" setting I could see the processes of the underlying host, like kubelet etc.
We can consider if this should be the default or at least make the switch easier for the users with and/or commented out sections.

/cc @flash1293 @gizas

ref: https://www.elastic.co/guide/en/beats/metricbeat/current/running-on-docker.html#monitoring-host

@flash1293
Copy link

flash1293 commented Aug 5, 2024

Thanks for checking, @ChrsMark - this looks helpful. Another bit to consider here - this won't allow you to somehow tie the process back to Kubernetes concepts, right? E.g. telling which container the process is about or something like this.

@christos68k
Copy link
Member

this won't allow you to somehow tie the process back to Kubernetes concepts, right? E.g. telling which container the process is about or something like this.

Maybe it's useful to note that we have the plumbing in place for this in ebpf-k8s-agent except we're not collecting/enriching every process on the target host, but only those associated with network flows.

@ycombinator ycombinator added the Team:Obs-DC Label for the Data Collection team label Aug 5, 2024
@ChrsMark
Copy link
Member Author

ChrsMark commented Aug 5, 2024

@flash1293 I don't think this is supported today by Metricbeat. But I would potentially see it handled by https://www.elastic.co/guide/en/beats/metricbeat/current/add-kubernetes-metadata.html. But this would require some research though to check if it's doable.
The idea here is that we want the process related metrics to be associated to containers+Pods. Maybe that's possible by leveraging the cgroup's information, but I'm only hard-guessing here :).

If what @christos68k suggests (or something similar) can cover the case, then that would be also great.

@flash1293
Copy link

Thanks @christos68k and @ChrsMark - seems like a somewhat high-hanging fruit for now. I think without this capability we can't produce good suggestions, as we can't tell the user which containers to annotate and also (probably even more important) won't be able to tell whether they have been instrumented already.

@cmacknz
Copy link
Member

cmacknz commented Aug 6, 2024

+1 to this being the default, simply seeing the processes running inside the Metricbeat or Elastic Agent container is not useful at all. Almost everyone turning this metricset on will want to see the set of processes on the node.

Additionally the processes should be correlated to their relevant Kubernetes resource types. There is some additional context on the state of this in an internal issue from our cloud SRE team. That issue shows that this correlation does not work when the cluster uses the containerd runtime, which is increasingly the default. It might work when the runtime is Docker.

@flash1293
Copy link

Thanks for this link @cmacknz - am I understanding right that there's two parts missing here to enable this:

If this is the case, I think we should go for it, as it will be a very nice feature in general and also help the auto-detection part of onboarding a lot as processes are very good signals to tell what kind of workload is running.

FYI @thomheymann @akhileshpok

@flash1293
Copy link

This is also important for otel collector, we should do it for both.

@gizas
Copy link
Contributor

gizas commented Aug 12, 2024

Hello, summarising the issue:

cc @thomheymann

@flash1293
Copy link

Do we have an issue that tracks any work for the comment #5256 (comment)?

@gizas I don't think so, could you create that one?

@gizas
Copy link
Contributor

gizas commented Aug 13, 2024

@flash1293 elastic/beats#40495 the issue for the processor enhancement.
As already said the #4670 is a prerequisite.

@gizas
Copy link
Contributor

gizas commented Aug 13, 2024

@flash1293 elastic/beats#40495 the issue for the processor enhancement. As already said the #4670 is a prerequisite.

@ChrsMark the above issue will track the work on agent side for the integrations.

For otel now we will need to track the same effort and analysis with host receiver and enrichment there (with k8s attributes ). Do we have something relevant with otel elastic agents? I think we need a new issue in opentelemtry-dev

@gizas
Copy link
Contributor

gizas commented Aug 26, 2024

@graphaelli FYI we have added this story in the backlog.

The the #4670 is a prerequisite for the story to happen. That is why we have not prioritised it in this iteration

Mainly we will need a) to collect the host processes and b) to enhance them with k8s metadata.

So for the a) collection side, we will need on standalone agent templates to include the fixes (we have this story and #5289 to track and not miss it) and on managed agent side the system integration will need to be updated (see comment)
For the b) metadata enhancement, elastic/beats#40495 is the issue to track the work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Obs-DC Label for the Data Collection team
Projects
None yet
Development

No branches or pull requests

6 participants