Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Container Memory Consumption Graph Behavior When Pod is Restarted #2522

Open
vladmalynych opened this issue Sep 19, 2024 · 0 comments
Labels

Comments

@vladmalynych
Copy link

Problem:

The Grafana dashboards defined in grafana-dashboardDefinitions.yaml include graphs for memory consumption per pod. The memory consumption query currently used is:

https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/grafana-dashboardDefinitions.yaml#L8300

                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "${datasource}"
                          },
                          "expr": "sum(container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container)",
                          "legendFormat": "__auto"
                      },
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "${datasource}"
                          },
                          "expr": "sum(\n    kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
                          "legendFormat": "requests"
                      },
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "${datasource}"
                          },
                          "expr": "sum(\n    kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
                          "legendFormat": "limits"
                      }
                  ],
                  "title": "Memory Usage (WSS)",
                  "type": "timeseries"
              },

When a pod is restarted, the current query adds memory usage data from both the old and new containers simultaneously. This can lead to temporary spikes in the displayed memory consumption. As a result, the dashboard may show memory usage that exceeds the container's memory limit, even though the actual memory consumption is within the limit.

Screenshot 2024-09-17 at 14 20 49
Screenshot 2024-09-17 at 14 22 55 (1)

Steps to Reproduce:

  • Trigger a pod restart (e.g OOM kill, or Evict).
  • Compare graphs with expression grouped by just container field with graph that has expression that groups by container and id:
"expr": "sum(container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container, id)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant