-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pluto service takes 5 minutes to commit settings #267
Comments
I expect what's happening is that You can provision the node with some of these settings already defined in order to avoid the dependency:
Note that using the instance ID as the hostname means that you will want to enable resource-based naming so that the hostname that kubelet registers with is actually present in the VPC's DNS zone. Alternatively, if the VPC has a proxy server set up, you can set the https proxy setting and |
Thank you for the tips, investigating configuration with customer to see if those things match. Is there a way to enable more verbose logging for pluto, perhaps something that shows failed retries, etc? |
|
Update on this from our side, I built an AMI with increased logging level for Sample log from
Is the only step forward to add manual logging statements in pluto to see what is happening? I see it doesn't seem to have any logs at the moment (but I haven't used rust so could be wrong)? I have output of logdog + pluto + sundog + settings-applier available (and can get others if needed) - what could be useful? EKS setup: |
Does it make sense to increase log level on other components ( |
Making this bug report because we observe slow init time for bottlerocket EKS nodes. Nodes get stuck in ~5 minutes before starting kubelet and joining the cluster. So far we pinpointed the source to slow pluto.service commit stage (which seems to come from this repo, correct?).
The clusters are using the latest EKS-optimized Bottlerocket image. It reproduces consistently on every new node but not for every cluster.
The question is how to investigate and fix the cause of this? We are not sure if this is a package issue or configuration issue in the clusters. The clusters have IDMS enabled. Not sure what else is required for this process.
Package I'm using:
pluto.service
What I expected to happen:
Startup to take 1-2 minutes and not 5+ minutes.
What actually happened:
Looking at systemd logs,
pluto.service
took 5 minutes to complete. We extracted logs from it and we observe theCommitting settings
step taking 5 minutes.Logs from pluto:
How to reproduce the problem:
Unclear, we only see this issue in some customer clusters but not on a fresh cluster.
** Extra information **
bash-5.0# apiclient get os
{
"os": {
"arch": "x86_64",
"build_id": "360b7a38",
"pretty_name": "Bottlerocket OS 1.26.2 (aws-k8s-1.30)",
"variant_id": "aws-k8s-1.30",
"version_id": "1.26.2"
}
}
The text was updated successfully, but these errors were encountered: