Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervisor PID client is None after outage #785

Open
jlashner opened this issue Oct 25, 2024 · 0 comments
Open

Supervisor PID client is None after outage #785

jlashner opened this issue Oct 25, 2024 · 0 comments

Comments

@jlashner
Copy link
Collaborator

We saw some very strange HWP behavior today which is likely related to an outage.

Grafana Logs

The supervisor seems unable to connect to the PID client, with logs like:

Completed with state: ControlState.Error(traceback='Traceback (most recent call last):\n  File "/opt/venv/lib/python3.10/site-packages/socs/agents/hwp_supervisor/agent.py", line 1087, in update\n    self.run_and_validate(clients.pid.declare_freq,\nAttributeError: \'NoneType\' object has no attribute \'declare_freq\'\n', start_time=1729869143.9402528)

After the outage, Bryce says they manually restarted the main process, but not the entire supervisor, putting it in a weird state. The supervisor should be robust to this, and should be sure not to attempt to run any operations if it cannot communicate to all necessary agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant