Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncm-metaconfig 23.6.0: all the scheduled actions should be attempted rather than stopping after the first one in error #1681

Open
jouvin opened this issue Apr 30, 2024 · 6 comments
Assignees
Labels

Comments

@jouvin
Copy link
Contributor

jouvin commented Apr 30, 2024

When ncm-mdetaconfig has actions associated with file changes (like a daemon restart but probably also true for other actions) and several files have been modified, it runs the actions one by one but stops immediately if the execution of one action returns an error. For exemple, if the file modifications require the restart of several services, and the first one fails to restart successfully, the other services are not restarted. I have not checked if the problem applies to other kind of actions or if it is just for the daemon one but I guess it applies to others...

IMO, this is an important flaw because once you fixed the problem with the service in error and rerun ncm-metaconfig, the fixed service may be restarted if the fix implies a change in the associated file but the others will not because their files were already updated and has not been changed in the last run. Thus the service has a chance not to work properly until a manual restart is done.

I think that the correct behaviour is to attempt to execute all the scheduled actions, no matter whether one is producing an error...

@ned21
Copy link
Contributor

ned21 commented Apr 30, 2024

Hi @jouvin

It's not covered in the component man page but the schema describes how you can prefix the command with a - sign to ignore any failures and keep going with other commands so the mechanism is really flexible.

HTH
Nathan

@ned21 ned21 added the question label Apr 30, 2024
@jouvin
Copy link
Contributor Author

jouvin commented May 1, 2024

Hi @ned21 thanks for reminding me this.... It should be at least better documented... Basically the whole template library is wrong! May be the default should have been the opposite but it's probably too late to change...

@stdweird
Copy link
Member

stdweird commented May 2, 2024

this is an issue with more then only this component. same applies to component dependencies. proper state mgmt between runs is not available.

wrt the docs, we used to generate documentation based on the schema. @jrha we still have that around?

@ned21
Copy link
Contributor

ned21 commented May 2, 2024

wrt the docs, we used to generate documentation based on the schema. @jrha we still have that around?

My bad, the schema page is generated but metaconfig has so many sub-man pages, it's hard to find it as it appears near the end after ::rsyslog and before ::singularity. It would be nice if we could find a way to place such an important companion doc at the top, directly after the top level manpage.

@jouvin
Copy link
Contributor Author

jouvin commented May 7, 2024

@stdweird you are partially right. We know that we have some flaws on the state management between runs but at least you can force a component to run and get the things fixed (at least for the idempotent configuration modules, i.e. most of them).

In this case the problem is more serious because you have no way to fix the problem by rerunning the configuration module. You need to restart the services manually if you know the exact list or easier/better reboot the server which is not optimal... At least if you don't add the magic character...

@jouvin
Copy link
Contributor Author

jouvin commented May 27, 2024

@stdweird @ned21 I had a look to the - trick but from what I see in the schema comments and in the code, it is not really doing what I want:

  • It seems to apply to command definitions, not to daemon actions. I have not checked carefully but I don't have the feeling that it would work to say -restart. And replacing daemon actions by explicit commands is making the config more complex.
  • When - is added is treats an error as a success for continuation (executing other actions) but also for reporting and I certainly don't want to see the error reported as a success (we could discuss warning against error but something must make explicit that the restart didn't work).

I don't know what can be done without breaking backward compatibility but the current behaviour IMO is wrong as it let a system misconfigured after the configuration issue causing the problem is fixed, without any way to fix it other than manually restarting the impacted service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants