You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added the flag --feature-gates=DisableStatusPollerCache=true to the kustomize-controller deployment, as mentioned in this issue - But this didn't make a difference, it still gets OOM killed in an hour.
Reduced the concurrency to 5 - At this point, the pod seems stable and memory consumption is around ~2.5G
Did a heap dump and the inuse_space is around ~22.64MB which is really less. Couldn't find anything useful there, but here's the link to the flamegraph. Also, here's the heap dump - heap.out.zip
Checked if we have a large repository that's loading unnecessary files as mentioned in this issue
We used in-memory kustomizations, but it was being a problem. It keeps exceeding the memory limits of the nodes. We also tried using Ephemeral SSDs, they got corrupted when the kustomize-controller restarted. So currently the /tmp is backed by a disk.
Ok so looks like all these problems are due to FS operations. The tmp should be empty almost all the time. Is there anything inside the repo that could cause this, recursive symlinks or such? Looking at the memory profile the issue seems related to Go untar and file read operations which are all from Go stdlib.
Background:
The
kustomize-controller
pod is getting OOMKilled every hour or so. Its reaches around~7.65G
and gets OOM Killed as the memory limit is8G
.ghcr.artifactory.gcp.anz/fluxcd/kustomize-controller:v1.2.2
184
kustomizations in totalThese are the flags enabled:
Requests & Limits:
What's been tried so far:
Added the flag
--feature-gates=DisableStatusPollerCache=true
to the kustomize-controller deployment, as mentioned in this issue - But this didn't make a difference, it still gets OOM killed in an hour.Reduced the concurrency to
5
- At this point, the pod seems stable and memory consumption is around~2.5G
Did a heap dump and the
inuse_space
is around~22.64MB
which is really less. Couldn't find anything useful there, but here's the link to the flamegraph. Also, here's the heap dump - heap.out.zipChecked if we have a large repository that's loading unnecessary files as mentioned in this issue
This is from the source-controller:
Want to understand what is causing the memory spike and OOM killings.
The text was updated successfully, but these errors were encountered: