[question] how to manage frequent restart in bluechi-controller and bluechi-agent? #427

dougsland · 2023-08-02T06:22:40Z

Describe the bug

Let's imagine we had a crash in bluechi-controller or bluechi-agent. For example: https://github.com/containers/eclipse-bluechi/issues/425

Should bluechi (same apply to agent) service keep down due: bluechi.service: Start request repeated too quickly. ?
keep trying until is able to restore? (i.e: a new config was sent to network) but how long to wait until to try the restart? - What's the minimum possible wait until to restart the node or redeploy? (agents depend on manager node to report)

There are systemd service keys that might help this behavior: StartLimitInterval and StartLimitBurst

Output of systemctl status bluechi-controller:

× hirte.service - Hirte systemd service controller manager daemon
     Loaded: loaded (/usr/local/lib/systemd/system/hirte.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Wed 2023-08-02 05:45:53 UTC; 1s ago
   Duration: 3ms
       Docs: man:hirte(1)
             man:hirte.conf(5)
    Process: 214542 ExecStart=/usr/bin/hirte -c /etc/hirte/hirte.conf (code=exited, status=1/FAILURE)
   Main PID: 214542 (code=exited, status=1/FAILURE)
        CPU: 3ms

Aug 02 05:45:53 control systemd[1]: hirte.service: Scheduled restart job, restart counter is at 5.
Aug 02 05:45:53 control systemd[1]: Stopped Hirte systemd service controller manager daemon.
Aug 02 05:45:53 control systemd[1]: hirte.service: Start request repeated too quickly.
Aug 02 05:45:53 control systemd[1]: hirte.service: Failed with result 'exit-code'.
Aug 02 05:45:53 control systemd[1]: Failed to start Hirte systemd service controller manager daemon.

The text was updated successfully, but these errors were encountered:

rhatdan · 2023-08-02T11:35:56Z

I would think a couple of restarts only. My understanding of FUSA, would be that once it fails, the car needs to go into safety mode.

engelmi · 2023-08-03T13:50:59Z

I thought we already added those, but it seem in #231 we only considered it.
Maybe can also set RestartSec - the default of 100ms seems pretty fast. Using RestartSteps and RestartMaxDelaySec for a kind of exponential backoff could also be interesting.

dougsland added the bug Something isn't working label Aug 2, 2023

mkemel added this to the v0.7 milestone Nov 21, 2023

mkemel added jira Issues that are synced to Jira backlog This is next up in priority and removed jira Issues that are synced to Jira labels Nov 21, 2023

mkemel removed this from the v0.7 milestone Nov 21, 2023

mkemel added the good first issue Good for newcomers label Nov 21, 2023

mkemel changed the title ~~[question] how to manage frequent restart in hirte and hirte-agent?~~ [question] how to manage frequent restart in bluechi-controller and bluechi-agent? Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] how to manage frequent restart in bluechi-controller and bluechi-agent? #427

[question] how to manage frequent restart in bluechi-controller and bluechi-agent? #427

dougsland commented Aug 2, 2023 •

edited by mkemel

Loading

rhatdan commented Aug 2, 2023

engelmi commented Aug 3, 2023

[question] how to manage frequent restart in bluechi-controller and bluechi-agent? #427

[question] how to manage frequent restart in bluechi-controller and bluechi-agent? #427

Comments

dougsland commented Aug 2, 2023 • edited by mkemel Loading

Describe the bug

rhatdan commented Aug 2, 2023

engelmi commented Aug 3, 2023

dougsland commented Aug 2, 2023 •

edited by mkemel

Loading