Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META][Kubernetes provider/Kubernetes module] Set watcher options namespace based on configuration #38978

Closed
5 of 6 tasks
constanca-m opened this issue Apr 16, 2024 · 20 comments
Closed
5 of 6 tasks
Assignees
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@constanca-m
Copy link
Contributor

constanca-m commented Apr 16, 2024

This is a Meta issue about the effect of namespace option in Kubernetes provider and Kubernetes module.
In more details:

  • Kubernetes autodiscover provider

Namespace option should affect the watchers options created by the kubernetes provider. This means that all the watchers created should include in the WatchOptions the namespace, thus limiting the namespaces to watch for.
In libbeat autodiscover provider (used by beats) this happens currently only for pod and node (not really needed) watcher. It is missing from namespace, deployment and cronjob watcher and should be added.

if metaConf.Namespace.Enabled() || config.Hints.Enabled() {

The same applies for elastic-agent. There, the namespace option seems to be missing not only in some watchers created for pod autodiscovery but also for service autodiscovery.

https://github.com/elastic/elastic-agent/blob/c45842a9d36b92e428f39857aea3334c6a99a082/internal/pkg/composable/providers/kubernetes/pod.go#L92

https://github.com/elastic/elastic-agent/blob/c45842a9d36b92e428f39857aea3334c6a99a082/internal/pkg/composable/providers/kubernetes/service.go#L49

  • Metricsets enrichers

Namespace option should also affect the watch options of the shared watchers, created for metadata enrichment. This would lead to events collected by other namespaces, not being enriched with kubernetes metadata. This is currently happening with the only exception of Namespace Watcher

func isNamespaced(resourceName string) bool {

  • add_resource_metadata
    No change needed

Additionally:

  • The documentation of both the kubernetes module and kubernetes provider should be updated to explain what the namespace option does with examples
  • In Kubernetes integration we should expose the namespace option to the relevant metrics. Maybe in the group level.
  • We should investigate the option for the user to provide a list of namespaces to watch for, instead of either all or just one.

Out of the above the following tasks can be created:

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 16, 2024
@constanca-m constanca-m added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Apr 16, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 16, 2024
@constanca-m constanca-m changed the title [Kubernetes provider] Set watcher options namespace for state_namespace [Kubernetes provider/metricbeat] Set watcher options namespace for state_namespace Apr 16, 2024
@MichaelKatsoulis MichaelKatsoulis changed the title [Kubernetes provider/metricbeat] Set watcher options namespace for state_namespace [Kubernetes provider/metricbeat] Set watcher options namespace based on configuration Apr 16, 2024
@constanca-m
Copy link
Contributor Author

@MichaelKatsoulis could you give your point of view on this issue?

I thought that setting options.namespace to a specific namespace would make state_namespace not receive any metrics from any namespace other than the one specified. So following that, and testing namespace: default filter in the kubernetes provider, I checked the amount of kubernetes.namespace I get:

Screenshot from 2024-05-17 16-47-14

It is multiple.

Now I am having trouble understanding if I wrote the implementation here wrong or if I am misunderstanding what the namespace filter does?

@MichaelKatsoulis
Copy link
Contributor

@constanca-m The namespace option that is set in the Kubernetes Provider configuration only affects the provider and the watchers of the provider. It is not connected in any way with the state_namesapce metricset and the namespace watcher that the metricsets share.
The namespace watcher that the kubernetes provider starts doesn't seem to take into account the namespace option when they are created. See https://github.com/elastic/elastic-agent/blob/e00d26da1eea0100730bba7e9ebb68cc539b9320/internal/pkg/composable/providers/kubernetes/pod.go#L92

The result is that we watch for resources in all namespaces while the user only selected one namespace.
That requires a fix in the kubernetes provider code in both elastic-agent and beats repo(

namespaceWatcher, err = kubernetes.NewNamedWatcher("namespace", client, &kubernetes.Namespace{}, kubernetes.WatchOptions{
)

When it comes to metricsets and the watchers that are created for the enrichers, there is also the namespace option(https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-kubernetes.html)

# Set the namespace to watch for resources
  #namespace: staging

This should affect the watchers of the enrichers. In the code we take care of that in

if isNamespaced(resourceName) {
options.Namespace = namespace
}

But the namespace resource is not included in the namespaced ones which is wrong. With a fix , the namespace watcher will only watch for resources in selected namespace.

Regarding the state_namespace and the namespaces it finds metrics for, this is not related to the namespace option. All the state metricsets collect the metrics from ksm. Ksm if not configured otherwise, includes metrics from all resources in the cluster and all namespaces. So state_namespace metricsets gets the desired metrics for all namespaces.
The namespace option the only difference it would make if set, is that for the metrics of namespaces that are not in the selected namespace(as set by the option) would not include metadata.

This is ok for now, we don't need to change that behaviour. If we really wanted to not collect metrics from the other namespaces we would need to add some filter in the

func (m *MetricSet) Fetch(reporter mb.ReporterV2) {
to exclude collected metrics if the namespace is different than in the options. But that is a different discussion.

@constanca-m
Copy link
Contributor Author

Thank you for such a detailed explanation @MichaelKatsoulis

I noticed our provider is different if unique is set to false or true:

if p.config.Unique {
p.eventManager, err = NewLeaderElectionManager(uuid, config, client, p.startLeading, p.stopLeading, logger)
} else {
p.eventManager, err = NewEventerManager(uuid, c, config, client, p.publish)
}

If using NewLeaderElectionManager (that is, unique: true), then we get data for all namespaces. I need to do research to see if it is possible to filter these results based on namespace, as I suspect this change would be made at a higher level (directly in the autodiscover file).

If using the other options, in this case PodEventer since it starts a namespace watcher (unique: false), we already have the namespace filter working, even without setting the options for the namespace watcher. Proof (this is the default setting, filtered by namespace: default):

image

@constanca-m
Copy link
Contributor Author

It seems that the way we use namespace filter is very depending on where we set it: either at provider level or metricset level. So before opening a PR, it would be important if we could actually define what is the expected behaviour for all possible situations @MichaelKatsoulis @gizas

Like mentioned today in the meeting, the namespace should only affect the metadata. I am keeping this in mind to define the expected behaviour.

The possible configurations are:

Option 1 unique: false and namespace set at provider level.

Example.
    metricbeat.autodiscover:
      providers:
        - type: kubernetes
          scope: cluster
          unique: false
          namespace: default
          templates:
            - config:
                - module: kubernetes
                  metricsets:
                    - pod
                  period: 10s
                  host: ${NODE_NAME}
                  hosts: ["https://${NODE_NAME}:10250"]
                  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  ssl.verification_mode: "none"

This means that we are using an eventer with watchers.

Expected behaviour: it should get data from all pods, but only the pods from the selected namespace should have metadata.

Current behaviour: it only gets data from the pods from the selected namespace. To match the expected behaviour, we need to remove the namespace option in the watchers, and find a way to pass it to the metricset level. Example: we need to remove all lines like this:

Namespace: config.Namespace,

Option 2 unique: false and namespace set at metricset level.

Example.
    metricbeat.autodiscover:
      providers:
        - type: kubernetes
          scope: cluster
          unique: false
          templates:
            - config:
                - module: kubernetes
                  namespace: default
                  metricsets:
                    - pod
                  period: 10s
                  host: ${NODE_NAME}
                  hosts: ["https://${NODE_NAME}:10250"]
                  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  ssl.verification_mode: "none"

This means that we are using an eventer with watchers.

Expected behaviour: it should get data from all pods, but only the pods from the selected namespace should have metadata.

Current behaviour: it seems to get stuck on kube-system namespace. I updated the example with a new namespace value, namespace: local-path-storage, and the behaviour was the same: only data from kube-system. I still don't know what changes are needed here to match expected behaviour.

Option 3 unique: true and namespace set at provider level.

Example.
    metricbeat.autodiscover:
      providers:
        - type: kubernetes
          scope: cluster
          unique: true
          namespace: default
          templates:
            - config:
                - module: kubernetes
                  metricsets:
                    - pod
                  period: 10s
                  host: ${NODE_NAME}
                  hosts: ["https://${NODE_NAME}:10250"]
                  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  ssl.verification_mode: "none"

We use leader election, which does not use watchers.

Expected behaviour: it should get data from all pods, but only the pods from the selected namespace should have metadata.

Current behaviour: setting the namespace at provider level is ignored. This happens because we only try to get the configuration defined in templates:

if config := p.templates.GetConfig(event); config != nil {
event["config"] = config
}

To match the expected behaviour, we would have to agree that setting at provider level, overrides the metricset level, and then document this somewhere. This change in code should be easy, and placed somewhere around this:

mapper, err := template.NewConfigMapper(config.Templates, keystore, k8sKeystoreProvider)
if err != nil {
return nil, errWrap(err)
}

Option 4 unique: true and namespace set at metricset level.

Example.
    metricbeat.autodiscover:
      providers:
        - type: kubernetes
          scope: cluster
          unique: true
          templates:
            - config:
                - module: kubernetes
                  namespace: default
                  metricsets:
                    - pod
                  period: 10s
                  host: ${NODE_NAME}
                  hosts: ["https://${NODE_NAME}:10250"]
                  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  ssl.verification_mode: "none"

Expected behaviour: it should get data from all pods, but only the pods from the selected namespace should have metadata.

Current behaviour: gets data from all pods, and metadata from all pods. This change should be easy, and the one in the description.

What are your thoughts? Should we set namespace differently depending on level provider or metricset?

@gizas
Copy link
Contributor

gizas commented May 24, 2024

Thansk for the examples @constanca-m!

the namespace should only affect the metadata.

So if we agree on this, the cases to filter metrics collection per namespace will only happen with conditions.

For me as long as the namespace variable filters metadata collection only, should be moved under add_resource_metadata block.
This way we can make use of https://github.com/elastic/integrations/blob/main/packages/kubernetes/data_stream/pod/agent/stream/stream.yml.hbs#L4

I agree we should set it in both, in provider and metricset/datastreasm

I dont see this any option currently in managed agents either in standalone to pass the namespace

So as per example:

  • Example1:
    Just we need to verify here if this returns less metrics. If does it is can be handy to use it in cases where where the k8s cluster is large. The good here is to be able to pass an array of namespaces

The metadata collection should happen with add_resource_metadata.namespace

  • Example2:
    Should be aligned with what we decide on Example1

  • Example3:

setting at provider level, overrides the metricset level,

Should happen the other way. The provider to be the default and the metricsets to override the provider

  • Example4:

Should be aligned with what we decide on Example1

@MichaelKatsoulis
Copy link
Contributor

@constanca-m and @gizas here is my thought:

We need to treat kubernetes provider, metricset and add_resource_metadata separately.

  • Kubernetes provider:
    In case the user sets the namespace to a specific value (regex is out of this conversation for now), this should affect what the provider watches for. This only affects the unique:false case as it is the only one that creates watchers for autodiscovery.
    The provider creates 3 watchers: node, pod and namespace. If namespace option is set, then this should affect the watch options of pod and namespace watcher. This should be made clear in the documentation. The provider's main purpose is autodiscovery of pods and not metadata addition(although it does so). So the namespace option in the provider level should affect the autodiscovery.
    This option as far as I understand can be set in managed elastic-agent by using the https://www.elastic.co/guide/en/fleet/current/advanced-kubernetes-managed-by-fleet.html.

  • Metricset level:
    If the namespace option is set in the metricset or module level, it should only affect the watchers used for this metricset/module. This would affect the metadata enrichment of the metrics collected. For example for pod metricset, if namespace is set to default, then only pods in default namespace should have metadata. But still the metricset would collect metrics from all pods in the cluster (not done by watchers). So if namespace option is set, in the enrichment process of the metricsets, we should set the right options to all the relevant watchers. But because the watchers are shared between the metricsets, if there is any other metricset that has not set the namespace option, then the watchers should be overridden with new watchers with right options. We already do that as far as I remember , except from the namespace watcher who is excluded by mistake. This namespace option can also be set in the elastic agent data streams. It would require to provide an option in each data stream or better in the data stream group level (if possible). I believe this feature is very important and we should allow an array of namespace to be passed. In case a user does not have access to some namespaces, they can exclude them from the list. So then no watcher will watch that specific namespace. They would also have to set the option to kubernetes provider.

  • add_resource_metadata:
    This block only affects additional metadata added to the event, related to the namespace(or node) the resource runs on.
    This means namespace metadata. If the user does not want namespace metadata or just wants some, then they set this options. It does not affect any watcher. It is just a filter in the add_metadata process in the elastic-agent-autodiscovery lib.

@constanca-m
Copy link
Contributor Author

@MichaelKatsoulis

For Kubernetes provider and unique: false:

The provider creates 3 watchers: node, pod and namespace. If namespace option is set, then this should affect the watch options of pod and namespace watcher.

Are we sure about this? The problem is that if we set the watcher options based on the namespace filter we stop seeing pods in the other namespaces. And according to namespace description:

(Optional) Select the namespace from which to collect the metadata. If it is not set, the processor collects metadata from all namespaces. It is unset by default.

In this case, it would stop collecting all data from other namespaces, not just metadata.

I am not against doing this, but we would have to update documentation to reflect this.

And for metricbeat level:

But still the metricset would collect metrics from all pods in the cluster (not done by watchers).

This behavior would then be different from the kubernetes provider namespace field, so documentation should be clear about it too.

But because the watchers are shared between the metricsets, if there is any other metricset that has not set the namespace option, then the watchers should be overridden with new watchers with right options.

I am a bit confused over this. Say I set namespace: a for state_pod and namespace: b for state_namespace. Should my namespace watcher collect metadata from both a and b for all state_* metricsets, even the ones that did not filter namespace? Or, as I think we are doing now, the ones that did not filter namespace overwrite the shared watcher and we collect metadata from all namespaces for all metricsets?

@MichaelKatsoulis
Copy link
Contributor

@constanca-m

In this case, it would stop collecting all data from other namespaces, not just metadata.
I am not against doing this, but we would have to update documentation to reflect this.

To me, that is what makes more sense. I believe that even the documentation wanted to state that, but it was badly written. The provider is about autodiscovery, not metadata. The provider achieves that using watchers. And honestly, is the only way a user can set for which namespaces they want to discover resources. Also the provider is mainly used for logs collection in the default behaviour. The templates are used mainly to enable different metricsets (for example a redis metricset in case a specific pod is discovered). Templates are not used to enable kubernetes related metricsets, except from the case of scheduler or controller-manager using a condition.

This behavior would then be different from the kubernetes provider namespace field, so documentation should be clear about it too.

It is different as the process is different. In metricset/module level we do not collect metrics using watchers. We just fetch them from endpoints. The namespace option can only affect the metadata enrichment process.

I am a bit confused over this. Say I set namespace: a for state_pod and namespace: b for state_namespace. Should my namespace watcher collect metadata from both an and b for all state_* metricsets, even the ones that did not filter namespace? Or, as I think we are doing now, the ones that did not filter namespace overwrite the shared watcher and we collect metadata from all namespaces for all metricsets?

As we only have shared watchers, it should be made clear that setting different namespace options in different metricsets can lead to unexpected behaviour. Why would a user need metadata for resource A(pod by state_pod) in a specific namespace, but for resource B (namespace by state_namespace) they need metadata for different namespaces.
The metricset that is enabled latest is the one that will set the right options in the end. That is why I believe in agent, the option should be in group level and affect all data streams under it.

@MichaelKatsoulis MichaelKatsoulis changed the title [Kubernetes provider/metricbeat] Set watcher options namespace based on configuration [META][Kubernetes provider/Kubernetes module] Set watcher options namespace based on configuration Jun 12, 2024
@constanca-m
Copy link
Contributor Author

I am trying to study the case to filter namespace at provider level for agents. However, there is data coming that I don't know where it is coming from. @tetianakravchenko @MichaelKatsoulis @gizas any idea about this issue that I will explain now?

I am deploying EA standalone with kubernetes.enabled: false and kubernetes_leaderelection: false. I am doing this to try to disable all metrics, since filtering the watchers of the provider by namespace was not working. So I want to understand what is sending metrics to my ES instance.

The configmap of my standalone looks like this.
apiVersion: v1
kind: ConfigMap
metadata:
  name: agent-node-datastreams
  namespace: kube-system
  labels:
    k8s-app: elastic-agent
data:
  agent.yml: |-
    id: 5d567105-e1d0-424f-a8a6-df8ed0332a0e
    outputs:
      default:
        type: elasticsearch
        hosts:
          - 'https://elasticsearch:9200'
        ssl.ca_trusted_fingerprint: A8CB71FBFD8EEF7FA0CF4CCA23FBF0F1E1726C272DC5214D9B1CC698A7BB1923
        username: '${ES_USERNAME}'
        password: '${ES_PASSWORD}'
        preset: balanced
    #providers:
    #  kubernetes:
    #    namespace: default
    #    resources:
    #      service:
    #        enabled: false
    #      node:
    #        enabled: false
    #  kubernetes_leaderelection:
    #    enabled: false
    #  processors:
    #    - add_kubernetes_metadata: nil
    providers.kubernetes.enabled: false
    providers.kubernetes_leaderelection.enabled: false
    inputs:
      - id: kubernetes/metrics-kubelet-870c8570-dc94-45cd-94ae-a6ae6331f6e7
        revision: 3
        name: kubernetes-1
        type: kubernetes/metrics
        data_stream:
          namespace: default
        use_output: default
        package_policy_id: 870c8570-dc94-45cd-94ae-a6ae6331f6e7
        streams:
          - id: >-
              kubernetes/metrics-kubernetes.pod-870c8570-dc94-45cd-94ae-a6ae6331f6e7
            data_stream:
              type: metrics
              dataset: kubernetes.pod
            metricsets:
              - pod
            namespace: default
            add_metadata: false
            hosts:
              - 'https://${env.NODE_NAME}:10250'
            period: 10s
            bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
            ssl.verification_mode: none
        meta:
          package:
            name: kubernetes
            version: 1.61.1
    secret_references: []
    revision: 4
    agent:
      download:
        sourceURI: 'https://artifacts.elastic.co/downloads/'
      monitoring:
        namespace: default
        use_output: default
        enabled: true
        logs: true
        metrics: true
      features: {}
      protection:
        enabled: false
        uninstall_token_hash: mTGBR6j9s1rA/2snmComqANoVk6pXJGlb3eb3VLyxtE=
        signing_key: >-
          MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAESHhqe/kCtV+0H9SZTS9XHHCrTQvqQ9cG2M81lw7ovqrE/roehN4dUdGsr0Q6pXsza9QK+h7XGtGp906QeGGlCw==
    signed:
      data: >-
        eyJpZCI6IjVkNTY3MTA1LWUxZDAtNDI0Zi1hOGE2LWRmOGVkMDMzMmEwZSIsImFnZW50Ijp7ImZlYXR1cmVzIjp7fSwicHJvdGVjdGlvbiI6eyJlbmFibGVkIjpmYWxzZSwidW5pbnN0YWxsX3Rva2VuX2hhc2giOiJtVEdCUjZqOXMxckEvMnNubUNvbXFBTm9WazZwWEpHbGIzZWIzVkx5eHRFPSIsInNpZ25pbmdfa2V5IjoiTUZrd0V3WUhLb1pJemowQ0FRWUlLb1pJemowREFRY0RRZ0FFU0hocWUva0N0ViswSDlTWlRTOVhISENyVFF2cVE5Y0cyTTgxbHc3b3ZxckUvcm9laE40ZFVkR3NyMFE2cFhzemE5UUsraDdYR3RHcDkwNlFlR0dsQ3c9PSJ9fSwiaW5wdXRzIjpbeyJpZCI6ImxvZ2ZpbGUtc3lzdGVtLWM5ZDU0NWM4LWQ5MDYtNGZlYi1iODQ3LWNiMmVjMmFhYmQ5ZCIsIm5hbWUiOiJzeXN0ZW0tMiIsInJldmlzaW9uIjoxLCJ0eXBlIjoibG9nZmlsZSJ9LHsiaWQiOiJ3aW5sb2ctc3lzdGVtLWM5ZDU0NWM4LWQ5MDYtNGZlYi1iODQ3LWNiMmVjMmFhYmQ5ZCIsIm5hbWUiOiJzeXN0ZW0tMiIsInJldmlzaW9uIjoxLCJ0eXBlIjoid2lubG9nIn0seyJpZCI6InN5c3RlbS9tZXRyaWNzLXN5c3RlbS1jOWQ1NDVjOC1kOTA2LTRmZWItYjg0Ny1jYjJlYzJhYWJkOWQiLCJuYW1lIjoic3lzdGVtLTIiLCJyZXZpc2lvbiI6MSwidHlwZSI6InN5c3RlbS9tZXRyaWNzIn0seyJpZCI6Imt1YmVybmV0ZXMvbWV0cmljcy1rdWJlbGV0LTg3MGM4NTcwLWRjOTQtNDVjZC05NGFlLWE2YWU2MzMxZjZlNyIsIm5hbWUiOiJrdWJlcm5ldGVzLTEiLCJyZXZpc2lvbiI6MywidHlwZSI6Imt1YmVybmV0ZXMvbWV0cmljcyJ9XX0=
      signature: >-
        MEQCIG5D/5SuvcsdVBQ6bsuJRkjuy2hVqaeomZoym2ASHxB/AiAUC2JeUsWo9NcyFo10FjydfEQcRjtEAhhlLyQv19Ix1w==
    output_permissions:
      default:
        _elastic_agent_monitoring:
          indices:
            - names:
                - logs-elastic_agent.apm_server-default
              privileges: &ref_0
                - auto_configure
                - create_doc
            - names:
                - metrics-elastic_agent.apm_server-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.auditbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.auditbeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.cloud_defend-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.cloudbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.cloudbeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.elastic_agent-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.endpoint_security-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.endpoint_security-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.filebeat_input-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.filebeat_input-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.filebeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.filebeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.fleet_server-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.fleet_server-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.heartbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.heartbeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.metricbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.metricbeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.osquerybeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.osquerybeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.packetbeat-default
              privileges: *ref_0
            - names:
                - metrics-elastic_agent.packetbeat-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.pf_elastic_collector-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.pf_elastic_symbolizer-default
              privileges: *ref_0
            - names:
                - logs-elastic_agent.pf_host_agent-default
              privileges: *ref_0
        _elastic_agent_checks:
          cluster:
            - monitor
        c9d545c8-d906-4feb-b847-cb2ec2aabd9d:
          indices:
            - names:
                - logs-system.auth-default
              privileges: *ref_0
            - names:
                - logs-system.syslog-default
              privileges: *ref_0
            - names:
                - logs-system.application-default
              privileges: *ref_0
            - names:
                - logs-system.security-default
              privileges: *ref_0
            - names:
                - logs-system.system-default
              privileges: *ref_0
            - names:
                - metrics-system.cpu-default
              privileges: *ref_0
            - names:
                - metrics-system.diskio-default
              privileges: *ref_0
            - names:
                - metrics-system.filesystem-default
              privileges: *ref_0
            - names:
                - metrics-system.fsstat-default
              privileges: *ref_0
            - names:
                - metrics-system.load-default
              privileges: *ref_0
            - names:
                - metrics-system.memory-default
              privileges: *ref_0
            - names:
                - metrics-system.network-default
              privileges: *ref_0
            - names:
                - metrics-system.process-default
              privileges: *ref_0
            - names:
                - metrics-system.process.summary-default
              privileges: *ref_0
            - names:
                - metrics-system.socket_summary-default
              privileges: *ref_0
            - names:
                - metrics-system.uptime-default
              privileges: *ref_0
        870c8570-dc94-45cd-94ae-a6ae6331f6e7:
          indices:
            - names:
                - metrics-kubernetes.pod-default
              privileges: *ref_0

However even after this I can see metrics coming over:
image

I can see from the logs both providers are not initiated! However add_kubernetes_metadata is always enabled, and there is no way to disable it...

@gizas
Copy link
Contributor

gizas commented Jun 20, 2024

As long as the input is there for kubelet the metrics should still come but I guess your question has to do with the metadata enrichement? So disabling the provider wont affect the metadata enrichement which still will happen based on enrichers right?

Indeed the add_kubernetes_metadata_processor is enabled by default and yes you can not disable it.

FYI: elastic/elastic-agent#4670 (comment) and #35244

Maybe for your tests you can build a metricbeat removing the processor here: https://github.com/elastic/beats/blob/main/x-pack/metricbeat/cmd/root.go#L72 . But dont think it worths to do it

Does it make sense to apply namespace config also in the processor?

@MichaelKatsoulis
Copy link
Contributor

MichaelKatsoulis commented Jun 20, 2024

@constanca-m from which metricset are those data in the screenshot coming?
You have kubernetes.pod metricset enabled. This will collect all the pods as reported by kubelet. Including the pods in kube-system namespace (and all others). The namespace filter you have set won't have any effect as you have disabled the add_metadata.
The kubernetes.namespace is a label added by the Kubernetes.pod dataset here

"namespace": pod.PodRef.Namespace,

It is not coming from any metadata enrichment process. The add_kubernetes_metadata processor is running by default but is skipped as the event includes already the kubernetes key.

@constanca-m
Copy link
Contributor Author

Does it make sense to apply namespace config also in the processor?

Yes, I have created a PR for that here: #39934 @gizas

from which metricset are those data in the screenshot coming?

They are coming from metrics-kubernetes.pod-default @MichaelKatsoulis

The namespace filter you have set won't have any effect as you have disabled the add_metadata.

Yes, I am aware.

But I am filtering all watchers from the pod eventer in my custom agent. I filter the provider by namespace (it is the commented part of my configmap) but I still see pods from kube-system. Is this expected then @MichaelKatsoulis ?

@MichaelKatsoulis
Copy link
Contributor

But I am filtering all watchers from the pod eventer in my custom agent. I filter the provider by namespace (it is the commented part of my configmap) but I still see pods from kube-system. Is this expected then @MichaelKatsoulis ?

Do you filter anywhere the results returned by /stats/summary endpoint of kubelet? If not, then it is expected

@constanca-m
Copy link
Contributor Author

Do you filter anywhere the results returned by /stats/summary endpoint of kubelet?

Could you point me to the code where that is? I can't seem to find it @MichaelKatsoulis

@MichaelKatsoulis
Copy link
Contributor

Here

body, err := m.mod.GetKubeletStats(m.http)

@constanca-m
Copy link
Contributor Author

Thanks @MichaelKatsoulis !!

But if that is the case and these metrics are really coming from the kubelet, then why did the test in the provider level that I posted in the description of #39881 worked? Because from my understanding namespace: ... at the provider level should only makes us see events from pods at the filtered namespace. 🤔

@constanca-m
Copy link
Contributor Author

I also want to add that you were right, the events are coming from the stats/summary.

This is one of the documents, we can see that in the service.address field.
{
  "_index": ".ds-metrics-kubernetes.pod-default-2024.06.18-000001",
  "_id": "RpwGgjgzgn3hWJ7AAAABkDUbfBQ",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2024-06-20T10:05:12.084Z",
    "agent": {
      "ephemeral_id": "02d5a3d9-e531-4a30-8942-69e6e5d747f7",
      "id": "b145ee38-8f4c-4b6f-977b-88f7a3ab7c3d",
      "name": "kind-control-plane",
      "type": "metricbeat",
      "version": "8.15.0"
    },
    "container": {
      "network": {
        "egress": {
          "bytes": 10703476
        },
        "ingress": {
          "bytes": 1287061
        }
      }
    },
    "data_stream": {
      "dataset": "kubernetes.pod",
      "namespace": "default",
      "type": "metrics"
    },
    "ecs": {
      "version": "8.0.0"
    },
    "elastic_agent": {
      "id": "b145ee38-8f4c-4b6f-977b-88f7a3ab7c3d",
      "snapshot": true,
      "version": "8.15.0"
    },
    "event": {
      "agent_id_status": "auth_metadata_missing",
      "dataset": "kubernetes.pod",
      "duration": 1308684,
      "ingested": "2024-06-20T10:05:12Z",
      "module": "kubernetes"
    },
    "host": {
      "architecture": "x86_64",
      "containerized": false,
      "hostname": "kind-control-plane",
      "name": "kind-control-plane",
      "os": {
        "codename": "focal",
        "family": "debian",
        "kernel": "6.6.31-linuxkit",
        "name": "Ubuntu",
        "platform": "ubuntu",
        "type": "linux",
        "version": "20.04.6 LTS (Focal Fossa)"
      }
    },
    "kubernetes": {
      "namespace": "kube-system",
      "node": {
        "name": "kind-control-plane"
      },
      "pod": {
        "cpu": {
          "usage": {
            "nanocores": 164833
          }
        },
        "memory": {
          "available": {
            "bytes": 33042432
          },
          "major_page_faults": 163,
          "page_faults": 166526,
          "rss": {
            "bytes": 8732672
          },
          "usage": {
            "bytes": 22245376
          },
          "working_set": {
            "bytes": 19386368
          }
        },
        "name": "kindnet-smg7t",
        "network": {
          "rx": {
            "bytes": 1287061,
            "errors": 0
          },
          "tx": {
            "bytes": 10703476,
            "errors": 0
          }
        },
        "start_time": "2024-06-19T13:13:32.000Z",
        "uid": "cf7160ec-f376-4447-8d3e-e771bdbde729"
      }
    },
    "metricset": {
      "name": "pod",
      "period": 10000
    },
    "service": {
      "address": "https://kind-control-plane:10250/stats/summary",
      "type": "kubernetes"
    }
  },
  "fields": {
    "container.network.ingress.bytes": [
      1287061
    ],
    "elastic_agent.version": [
      "8.15.0"
    ],
    "host.os.name.text": [
      "Ubuntu"
    ],
    "host.hostname": [
      "kind-control-plane"
    ],
    "service.type": [
      "kubernetes"
    ],
    "agent.name.text": [
      "kind-control-plane"
    ],
    "host.os.version": [
      "20.04.6 LTS (Focal Fossa)"
    ],
    "kubernetes.namespace": [
      "kube-system"
    ],
    "kubernetes.pod.network.rx.bytes": [
      1287061
    ],
    "kubernetes.pod.network.tx.bytes": [
      10703476
    ],
    "host.os.name": [
      "Ubuntu"
    ],
    "agent.name": [
      "kind-control-plane"
    ],
    "host.name": [
      "kind-control-plane"
    ],
    "event.agent_id_status": [
      "auth_metadata_missing"
    ],
    "kubernetes.pod.memory.rss.bytes": [
      8732672
    ],
    "metricset.name.text": [
      "pod"
    ],
    "host.os.type": [
      "linux"
    ],
    "kubernetes.pod.memory.page_faults": [
      166526
    ],
    "data_stream.type": [
      "metrics"
    ],
    "host.architecture": [
      "x86_64"
    ],
    "agent.id": [
      "b145ee38-8f4c-4b6f-977b-88f7a3ab7c3d"
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "host.containerized": [
      false
    ],
    "service.address": [
      "https://kind-control-plane:10250/stats/summary"
    ],
    "agent.version": [
      "8.15.0"
    ],
    "host.os.family": [
      "debian"
    ],
    "kubernetes.pod.network.rx.errors": [
      0
    ],
    "kubernetes.node.name": [
      "kind-control-plane"
    ],
    "kubernetes.pod.network.tx.errors": [
      0
    ],
    "kubernetes.pod.uid": [
      "cf7160ec-f376-4447-8d3e-e771bdbde729"
    ],
    "kubernetes.pod.cpu.usage.nanocores": [
      164833
    ],
    "agent.type": [
      "metricbeat"
    ],
    "kubernetes.pod.start_time": [
      "2024-06-19T13:13:32.000Z"
    ],
    "kubernetes.pod.memory.major_page_faults": [
      163
    ],
    "event.module": [
      "kubernetes"
    ],
    "container.network.egress.bytes": [
      10703476
    ],
    "host.os.kernel": [
      "6.6.31-linuxkit"
    ],
    "kubernetes.pod.name": [
      "kindnet-smg7t"
    ],
    "elastic_agent.snapshot": [
      true
    ],
    "kubernetes.pod.memory.available.bytes": [
      33042432
    ],
    "kubernetes.pod.memory.working_set.bytes": [
      19386368
    ],
    "elastic_agent.id": [
      "b145ee38-8f4c-4b6f-977b-88f7a3ab7c3d"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "metricset.period": [
      10000
    ],
    "host.os.codename": [
      "focal"
    ],
    "event.duration": [
      1308684
    ],
    "metricset.name": [
      "pod"
    ],
    "event.ingested": [
      "2024-06-20T10:05:12.000Z"
    ],
    "@timestamp": [
      "2024-06-20T10:05:12.084Z"
    ],
    "host.os.platform": [
      "ubuntu"
    ],
    "data_stream.dataset": [
      "kubernetes.pod"
    ],
    "agent.ephemeral_id": [
      "02d5a3d9-e531-4a30-8942-69e6e5d747f7"
    ],
    "kubernetes.pod.memory.usage.bytes": [
      22245376
    ],
    "event.dataset": [
      "kubernetes.pod"
    ]
  }
}

@MichaelKatsoulis
Copy link
Contributor

I just saw the configuration you had in the #39881.

The following config does not make sense in any real world use case:

 metricbeat.autodiscover:
      providers:
        - type: kubernetes
          scope: cluster
          unique: false
          namespace: default
          add_resource_metadata:
            deployment: true
            cronjob: true
          templates:
            - config:
                - module: kubernetes
                  metricsets:
                    - pod
                  period: 10s
                  host: ${NODE_NAME}
                  hosts: ["https://${NODE_NAME}:10250"]
                  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  ssl.verification_mode: "none"

What the above does is:

  1. The provider starts and watchers for pods only in default namespace
  2. For every pod that it finds, it triggers the template set.
  3. So one config block with pod metricset is started for every pod in the default namespace
  4. This happens because there is no condition set in the template

I can only suspect what happens next:

  1. The events collected by the multiple kubernetes.pod metricsets from stats/summary include all pods of all namespaces
  2. The events also are enriched with the right metadata as in metricset level, there is no namespace filter
  3. The fact that the metricset has started under a specific block that belongs to a certain discovered pod, makes the provider add all the kubernetes metadata to the event
  4. But the metadata each block has, are the ones of the pod that was discovered by the provider, This means only the default's namespace
  5. So then the provider overrides the correct metadata of the events with the ones of the block created by the provider.
  6. In the end you have multiple times the same kubernetes.pod.name and namespace but the actual metrics are different

@constanca-m
Copy link
Contributor Author

The provider starts and watchers for pods only in default namespace

Isn't this what we want when we filter by namespace? I have tested the changes i put on PR elastic/elastic-agent#4975. Filtering by namespace or not does not bring any different results... Especially if add_metadata is enabled, then we know the enrichers are starting @MichaelKatsoulis

@MichaelKatsoulis
Copy link
Contributor

Isn't this what we want when we filter by namespace?

Depending on the case of where the filter is, we want different things.

I have tested the changes i put on PR elastic/elastic-agent#4975. Filtering by namespace or not does not bring any different results... Especially if add_metadata is enabled, then we know the enrichers are startin

Kubernetes provider does not affect the kubernetes metrics data streams in any way when agent is used. Unless dynamic variables are used which are not by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

No branches or pull requests

3 participants