You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
---------------------------------------- when memory request is 64Mi and limit is removed
sh-5.1$ echo =====; echo start at $(date --iso-8601=seconds), usage is [$(awk '{print $1/1024/1024/1024 " Gi"}' /sys/fs/cgroup/memory/memory.usage_in_bytes)], limit is [$(awk '{print $1/1024/1024/1024 " Gi"}' /sys/fs/cgroup/memory/memory.limit_in_bytes)] && (/manager > /tmp/manager.log 2>&1 &) && prev=$(date +%s); while true; do now=$(date +%s); echo -n $((now-prev))s:; awk '{print $1/1024/1024/1024 " Gi"}' /sys/fs/cgroup/memory/memory.usage_in_bytes; sleep 2; done;
=====
start at 2024-12-04T05:09:47+00:00, usage is [0.00228882 Gi], limit is [8.58993e+09 Gi]
0s:0.0105362 Gi
2s:0.0860825 Gi
4s:0.149323 Gi
6s:0.241344 Gi
8s:0.272717 Gi
10s:0.363647 Gi
12s:0.587654 Gi
15s:0.665798 Gi
17s:0.714954 Gi
19s:0.813492 Gi
21s:0.812382 Gi
23s:1.23141 Gi
25s:1.46246 Gi <<< process got stabilized at a peak usage of ~1.5Gi
27s:1.46161 Gi
29s:1.46153 Gi
31s:1.46188 Gi
33s:1.46174 Gi
35s:1.46168 Gi
37s:1.46179 Gi
39s:1.46175 Gi
41s:1.46176 Gi
^C
---------------------------------------- when memory request is 64Mi and limit is 128Mi (set during build time)
sh-5.1$ echo =====; echo start at $(date --iso-8601=seconds), usage is [$(awk '{print $1/1024/1024/1024 " Gi"}' /sys/fs/cgroup/memory/memory.usage_in_bytes)], limit is [$(awk '{print $1/1024/1024/1024 " Gi"}' /sys/fs/cgroup/memory/memory.limit_in_bytes)] && (/manager > /tmp/manager.log 2>&1 &) && prev=$(date +%s); while true; do now=$(date +%s); echo -n $((now-prev))s:; awk '{print $1/1024/1024/1024 " Gi"}' /sys/fs/cgroup/memory/memory.usage_in_bytes; sleep 2; done;
=====
start at 2024-12-04T05:12:47+00:00, usage is [0.00228882 Gi], limit is [0.125 Gi]
0s:0.00852203 Gi
2s:0.0965805 Gi
4s:0.00250244 Gi <<< process got killed and can see drop in overall usage from here (current bug)
6s:0.00239563 Gi
8s:0.00235748 Gi
^C
---------------------------------------- when memory request is 64Mi and limit is set to 1.5Gi (based on first observation)
sh-5.1$ echo =====; echo start at $(date --iso-8601=seconds), usage is [$(awk '{print $1/1024/1024/1024 " Gi"}' /sys/fs/cgroup/memory/memory.usage_in_bytes)], limit is [$(awk '{print $1/1024/1024/1024 " Gi"}' /sys/fs/cgroup/memory/memory.limit_in_bytes)] && (/manager > /tmp/manager.log 2>&1 &) && prev=$(date +%s); while true; do now=$(date +%s); echo -n $((now-prev))s:; awk '{print $1/1024/1024/1024 " Gi"}' /sys/fs/cgroup/memory/memory.usage_in_bytes; sleep 2; done;
=====
start at 2024-12-04T05:17:32+00:00, usage is [0.00230789 Gi], limit is [1.5 Gi]
0s:0.0109749 Gi
2s:0.0796547 Gi
4s:0.134762 Gi
6s:0.239876 Gi
8s:0.313656 Gi
10s:0.388733 Gi
13s:0.397102 Gi
15s:0.42952 Gi
17s:0.571835 Gi
19s:0.574135 Gi
21s:0.602509 Gi
23s:0.675396 Gi
25s:0.991096 Gi <<< process got stabilized at ~1Gi
27s:0.988972 Gi
29s:0.989239 Gi
31s:0.989227 Gi
33s:0.989208 Gi
35s:0.989239 Gi
37s:0.989223 Gi
39s:0.989208 Gi
41s:0.98914 Gi
^C
Root cause:
cephcsi-operator manager is not setting any namespace during it's creation due to which all configmaps, deployments, daemonsets, services (watched by controllers) are listed and stored in manager cache which is ballooning the memory usage.
Workaround:
Remove the limit or set the limit to 1.5Gi if this is blocking deliverables
Fix:
only cache the objects with a label across multiple namespaces (or) restrict the operator to single namespace
The text was updated successfully, but these errors were encountered:
leelavg
changed the title
High cpu usage due to watching all namespaces
High memory usage due to watching all namespaces
Dec 4, 2024
Root cause:
Workaround:
Fix:
The text was updated successfully, but these errors were encountered: