-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow disabling and configuring the add_x_metadata default global processors #4670
Comments
I'm still unfamiliar with how enrichers work, so bear with me here. As far as I understand, they only apply to the metricsets that use them directly and are not configurable. I think we still need an option for users to be able to easily add the same metadata as before with any input type. The problem we're trying to solve here is especially problematic on Elastic Agent, where we start a new Beat instance for each unique input type. Each of these Beats will run their own instance of I was thinking that for Agent users we could add the same exact properties that the processor gives, but isntead through the Kubernetes provider by updating integrations and standalone config examples to populate inputs:
- id: container-log-${kubernetes.pod.name}-${kubernetes.container.id}
type: filestream
streams:
- id: container-log-${kubernetes.pod.name}-${kubernetes.container.id}
data_stream:
dataset: kubernetes.container_logs
type: logs
prospector.scanner.symlinks: true
parsers:
- container: ~
paths:
- /var/log/containers/*${kubernetes.container.id}.log
# 👇 Here is what is new
processors:
- add_fields:
target: kubernetes
fields:
pod.name: ${kubernetes.pod.name}
pod.id: ${kubernetes.pod.id}
namespace: ${kubernetes.namespace}
labels: ${kubernetes.labels} Ideally this is actually something that Agent could pull off automatically without having to change integrations and configurations. This would help avoid a breaking change. I see a couple options to solve this:
|
Small correction on this is that yes they start by default but are configurable based on add_resource_metadata block . See below example from managed agent (same from standalone as well): So at the moment you can only disable namespace enrichement (this is the main one that gives us labels and annotations), node enrichment (that adds node and hostname), cronjob (that adds in pods ony cronjob name) and deployments (that adds in pods deployment name) So what you describe above for the processor can still happen in the enricher and be configured with the options you specify. We maybe need to specify extra user scenarios but it is ok. The disablement of the processor needs to happen as both start and this is not correct.
I understand your point. To see if agent has this info already and not to ask again from them. We will need to see the code and if the cache of agent can be utlised from the integration side. |
Hey folks, I'm not sure if the components are completely distinguished here. So here is a quick list of the related components: 1) k8s provider of Elastic Agent
2) k8s enrichers of Metricbeat
3)
|
Thanks @ChrsMark , so trying to summarise your info and Josh's comments:
|
@gizas 👍🏽 I had filed an issue regarding the memory utilization improvement by using common shared caches etc: elastic/beats#35591. But as I said this is out of scope for the current problem. So yes let's try to see if we can completely disable the |
+1 to what @ChrsMark mentioned. |
More on the ENV var approach , a variable worth investigating is |
@gizas This is not entirely true. Filebeat behaves differently. Code at https://github.com/elastic/beats/blob/157b31a79e3ad52578f30399049cac030d688934/x-pack/filebeat/cmd/root.go#L51 shows that for filebeat when SHIPPER_MODE=true all the default processor are disabled. this is only valid for filebeat and not for metricbeat. |
@MichaelKatsoulis personally I wouldn't override the behaviour of BEAT_SETUID_AS to be used as a feature flag for enabling/disabling processors. |
@gsantoro we had done something similar with an environmental variable here to alter the configuration of processors. Maybe we can think another variable to do the disablement of kubernetes processor and not be a breaking change? Besides the x-pack configuration in this story (and x-pack folder I guess affects only Elastic agent when it starts beats underneath) we need to verify whether the configuration files (like https://github.com/elastic/beats/blob/main/filebeat/filebeat.yml#L176) needs also to be changed (when we solely run each beat) Additionally, we would need a diagnsotics before and after the changes to figure out the improvement in memory and be sure that processor is not run otherwise. Any other ideas to measure possible imrpovement? |
One thing we could explore here is using the recently added elastic-agent/elastic-agent.reference.yml Lines 192 to 197 in e3c0695
Right now there is only a single feature flag, which controls whether the agent hostname is the fully qualified domain name or not. While it would be better if we explicitly supported global processors as a first class concept in the agent policy, there is still a lot of debate about how to do this properly since doing it for every process in agent regardless of if it is implemented as a Beat or not is complex. So rather than waiting for the global processors problem to be solved, we could allow configuring the set of Beats global processors in the features section of the policy. If we expose these as feature flags we can conceivably obsolete them later once global processor support exists. For example we could expose them as something a bit abstracted from the implementation like (final syntax TBD): agent.features:
event_metadata:
# The default value would be true to avoid a breaking change when they aren't specified.
host: true # controls add_host_metadata
kubernetes: false # controls add_kubernetes_metadata
cloud: true # controls add_cloud_metadata
docker: false # controls add_docker_metadata This would give us a nice way to control this in the agent policy. This wouldn't be available in the Fleet UI without a change there to expose it, but it could immediately be controlled using the Fleet override API. The agent passes the feature flags to sub-processes like Beats using the control protocol: https://github.com/elastic/elastic-agent-client/blob/21e4fd899bbcbe622248407e90744217da77b2b4/elastic-agent-client.proto#L268-L271 // Features are the expected feature flags configurations. Can apply to either components or
// individual units depending on the flag and its implementation. Omitted if the client reports
// it has applied the current configuration. Added in Elastic Agent v8.7.1.
Features features = 3; This eventually ends up being handled in the x-pack/libbeat/management code in Beats: Most of the work with this approach would be allowing changes of the feature flag to remove processors from the list of default global processors. These are currently initialized once at startup, we'd need a way to manipulate the processors list in the pipeline based on feature flag changes. |
Nice @cmacknz ! agent.features:
event_metadata:
kubernetes: false # controls add_kubernetes_metadata then we actually pass the correct config here https://github.com/elastic/beats/blob/06a8c09655b0d1d5d4ec4bd77bc930d18975155e/x-pack/libbeat/management/managerV2.go#L572-L584 Just a reminder that regardless of the way to do it, we must verify that if by default we disable the add_kubernetes_metadata we dont break anything and we dont miss metadata on our collection. We should also measure the improvement in memory and number of API calls if any |
hey @gizas and @cmacknz , I would introduce a variable No Fleet code change necessary. Just set the right values in the manifests and some docs. What do you guys think? |
That is what I would call the minimum viable change. That only fixes the k8s metadata processor for users running the agent as a container (see below), and requires them to update their k8s YAML to include it. You would think the k8s configuration stays up to date, but I have seen it lag the agent version (edit: this is probably true of the features based approach as well since it defaults to on). What is proposed in #4670 solves this problem for every global processor in every deployment model. Adding environment variables to agents deployed on individual endpoints (laptops, desktops, etc) requires mass reconfiguration of the agent outside of Fleet since the agent installation itself must be modified. Using an environment variable is really only viable in containers/IaC use cases. I'd prefer this get implemented as described in #4670, being able to disable global processors is a pretty common request. This is the platform level solution, which I am obviously biased towards as a person who maintains agent as a platform. That said doing it that way will take significantly longer, probably 1 month for someone who has never touched the inside of agent. The agent team itself is unlikely to be able to take that on itself in the near future either. So if you can't commit to that then the simpler solution is the way to go. Ideally we aren't building up layers of band aids like this all over the place, but we already have a precedent for this processor with the NETINFO variable at least so it isn't new to this area. |
@cmacknz I understand that this feels like a band aids but I don't think any of us has 1 month to be working on this. I have been thinking that probably we don't even need the environment variable. If elastic-agent is already doing the work to add k8s metadata we can't just remove AKAIK Beats doesn't add this processor unless it is instrumented to do so with a config. So we just don't want to add that processor by default when running in elastic-agent. |
Sorry late to the party, I would only add that this story initially will try to verify that the complete removal of the add_kubernetes_metadata processor if it breaks something. So indeed I would try and see if just removing it is everything ok Of course their might be cases that we can not think right now that we might break (and correct me on this) if the user then manually adds the processor or globally enables it (whenever is ready), wont this be sufficient? |
The only way to get the k8s metadata on events with the agent kubernetes provider is to have an integration template in the processors (or add them manually) using the data the k8s provider collected. I believe you do some of this in the k8s integration already, but removing I think Josh proposed this earlier, but we could have agent push the k8s metadata it collected down to each Beat using the control protocol where they could automatically generate the equivalent static processor. Then you could remove the The extra work here is a product of the agent sub-process architecture, which we are stuck with until we can shift more of this functionality into the OTel collector embedded in the agent. That is not going to happen in the short term. |
This conversation is getting farther into the internals of the agent, happy to jump on a call to help close this out if that would help. |
How to run audit-logs tests locallydiscussion on github. You must check out the repo Before starting, you need to change the params at
I have an elastic-agent repo with optimizations that make the compilation faster at You need to start with a clean version of the Beats repo. Run the following commands to setup Elastic-agent with # activate env variables to configure audit logs in Kind
cd tools/envs/local-auditlogs
# start k8s on Kind, Elastic via elastic-package, configure Fleet to use elastic-agent built from source
task local-managed-source:up Keep it running for 30 minutes to get some audit logs. you can visualize the audit logs using a dashboard at Now you want to change the code in beats to disable the processor You will need to comment the line and rebuild/rerun elastic-agent with those changes # activate env variables to configure audit logs in Kind
cd tools/envs/local-auditlogs
# start k8s on Kind, Elastic via elastic-package, configure Fleet to use elastic-agent built from source
task local-managed-source:reload-from-source keep it running for another 30 minutes and compare the results. |
Summary of a zoom call with @cmacknz and @MichaelKatsoulis
|
I'm closing this issue for now since it's probably going to be offloaded to @cmacknz's team |
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
It may actually be better to put this in the providers section of the policy, and kill two birds with one stone by exposing the agent.providers in Fleet which aren't supported yet. Something like: providers:
event_metadata:
# The default value would be true to avoid a breaking change when they aren't specified.
host: # controls add_host_metadata
enabled: true
kubernetes: # controls add_kubernetes_metadata
enabled: false
cloud: # controls add_cloud_metadata
enabled: true
docker: # controls add_docker_metadata
enabled: false |
@kpollich I wanted to bring this to your attention after speaking to Craig. I think perhaps we should finally introduce the "advanced" section in the Agent Policy and add this "providers" section. These booleans can be generated from config also. |
Sounds good to me. I think this should be achievable with the advanced settings framework. |
Updated the description and transferring to the agent repository since this is really an agent feature. |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
Initial design/implementation ideas:
Any thoughts/feedback/observations @cmacknz , @ycombinator ? |
Why do the Beats need to re-exec themselves? These processors map to global Beat processors (processors at the top level of the Beat configuration as defined in https://www.elastic.co/guide/en/beats/filebeat/current/defining-processors.html#where-valid) and should be reloadable like they are if you were writing the Beat configuration by hand. |
I didn't find a place other than the very early stages when building out the pipelines happens using the global processors... At that point, if we want to configure such processors we need to have the configuration available but I don't think the client is up yet (hence the idea of writing out a partial config file and re-exec)... I will have another look once I have the new client in and see what happens, if we can reconfigure the processors after we've assembled the pipeline I would be happy to avoid writing out a file and re-exec so if by the time the beat starts receiving CheckinExpected messages we can still configure global processors on the fly, I will be more than happy save myself the extra complication and hassle |
You should be able to test whether hot reloading the global processors at anytime works by trying it with a standalone Beat configuration. |
Quick sum up of the solution and progress so far: The solution requires modifications across 3 repositories:
To go a bit deeper on the beats initialization problem:
To try and load global processor config I opened 2 draft PRs in the
Neither of these beats PR are ultimately ready:
|
Edited by @cmacknz to generalize to all add_x_metadata processors and include the suggested approach from #4670 (comment)
Describe the enhancement:
The default set of processors for each Beat, when they are started by agent, is defined in the code and is not configurable. Here is the definition for Metricbeat. Same default config stands for rest of beats.
For example the
add_kubernetes_metadata
processor is used to enrich kubernetes events with extra metadata fields. This results in the number of k8s API calls being proportional to the number of inputs in the agent policy. In this case the same functionality is also present throughenrichers
and start by default in code of beats as described here-see enrichers.This enhancement will allow disabling or configuring any of the default beat global processors. This will be done by defining a new type of provider , an event_data provider, which controls the behavior of the event metadata processors added by each beat.
We will additionally expose the entire contents of the providers section of the agent policy in Fleet to allow configuring this. Here is a sample of the intended configuration:
Describe a specific use case for the enhancement or feature:
All the elastic-agent Kubernetes deployments experience the default enablement of processor in the background, which can be "expensive" (as far as extra memory and number of API calls) that the agent has to do, especially in big Kubernetes deployments
Related Issues:
Definition of Done:
The text was updated successfully, but these errors were encountered: