-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUT /omicron-physical-disks
timeout on rack3
#6904
Comments
Here's what I can see from the log:
Which roughly corresponds with this sequence of code: omicron/sled-agent/src/sled_agent.rs Lines 894 to 915 in b034821
Note, however, that the last omicron/sled-agent/src/sled_agent.rs Lines 912 to 915 in b034821
|
The instance manager This sends a messages to the InstanceManager task: omicron/sled-agent/src/instance_manager.rs Lines 314 to 325 in b034821
Which should be caught here: omicron/sled-agent/src/instance_manager.rs Lines 496 to 499 in b034821
and processed here: omicron/sled-agent/src/instance_manager.rs Lines 785 to 799 in b034821
However, I see no indication that we actually processed this message in the log. We sent similar requests to the zone_bundler, probe_manager, etc, and I can see all of those in the log. Weirdly, I actually don't see any messages from the |
If we have more logs from this sled agent (were they rotated?) that would be extremely useful. I'm particularly interested in any messages from |
Is it possible we're creating a zone bundle for an instance here, and the instance-manager is stuck on that for whatever reason? It seems like the |
Today while investigating an issue with a different sled, we briefly enabled the
blueprint_executor
task.PUT /omicron-physical-disks
requests to sled 13 (BRM42220064
) were consistently timing out after 60 seconds. The sled-agent log is at/staff/rack3/BRM42220064/2024-10-18/oxide-sled-agent-default.log
. We see the dropshot "client disconnected" logs from client timeouts that are consistent with the 60sec timeout error we were seeing from Nexus, like:The text was updated successfully, but these errors were encountered: