Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Remove redfish credential validation in install event handler #192

Conversation

dashmage
Copy link
Contributor

@dashmage dashmage commented Mar 19, 2024

Fixes #190

The exporter.check_active method being called in the restart_exporter method (from the update status event handler) will fail in case the exporter installation has not been done. This installation process takes place as part of the install event handler.

The error happens because the check_installed function in service.py (called as a decorator for the various exporter service helper methods) would return False when the exporter isn't installed. This would cause all the Exporter class' methods in service.py (restart, check_active, check_health etc.) to return False. The issue occurs when the on_update_status() --> restart_exporter() raises an exception when exporter.check_active consistently returns False.

This PR removes the redfish credential validation step during the install event since it's likely to not have the right configuration during the installation. Moreover, this can leave the charm in an inconsistent state where the exporter is not properly installed, leading to an error during the execution of other lifecycle events.

@dashmage dashmage requested a review from a team as a code owner March 19, 2024 14:56
@dashmage
Copy link
Contributor Author

Will fix unit tests after rebasing changes from #191 once it's merged.

chanchiwai-ray
chanchiwai-ray previously approved these changes Mar 20, 2024
Copy link
Contributor

@chanchiwai-ray chanchiwai-ray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the workaround

src/charm.py Outdated Show resolved Hide resolved
@dashmage dashmage self-assigned this Mar 21, 2024
src/charm.py Outdated Show resolved Hide resolved
@dashmage dashmage force-pushed the bseng-2197-check-exporter-installed-before-healthcheck branch 3 times, most recently from 5897906 to ba9d413 Compare March 21, 2024 11:04
@dashmage dashmage force-pushed the bseng-2197-check-exporter-installed-before-healthcheck branch from ba9d413 to 47a403a Compare March 21, 2024 11:14
src/charm.py Outdated Show resolved Hide resolved
@dashmage dashmage force-pushed the bseng-2197-check-exporter-installed-before-healthcheck branch from f1a869b to 1df3586 Compare March 21, 2024 13:37
@dashmage dashmage force-pushed the bseng-2197-check-exporter-installed-before-healthcheck branch from 1df3586 to 7351e70 Compare March 21, 2024 13:46
Copy link
Contributor

@chanchiwai-ray chanchiwai-ray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to me that I will fix the issue.. Have you test if it can fix the issue ?

Since the validation of redfish config did not remove from install or upgrade event, there's still possibility that operator deploys the charm with incorrect credential and causing the exporter not being installed.

@dashmage
Copy link
Contributor Author

dashmage commented Mar 21, 2024

It doesn't seem to me that I will fix the issue.. Have you test if it can fix the issue ?
Since the validation of redfish config did not remove from install or upgrade event, there's still possibility that operator deploys the charm with incorrect credential and causing the exporter not being installed.

Good thing you brought this up. So I did test this initially and the hook didn't fail for the charm and no exceptions were raised. So I thought the code was good. But upon checking once more carefully, another issue comes up. With the latest change, even after I configure the redfish creds, the status doesn't change from BlockedStatus with invalid redfish creds message. This might be because the ActiveStatus is not being reached at the end of the update status hook and it keeps looping when exporter_installed is not set. So I might have to revert that change.

^ @Pjack

src/charm.py Outdated Show resolved Hide resolved
@dashmage dashmage force-pushed the bseng-2197-check-exporter-installed-before-healthcheck branch from e75cbaf to e9e45aa Compare March 23, 2024 09:52
@dashmage
Copy link
Contributor Author

This commit just cleans up the unit test names and docstrings since many of them were incorrect.

@dashmage dashmage force-pushed the bseng-2197-check-exporter-installed-before-healthcheck branch from e9e45aa to 6432d07 Compare March 23, 2024 09:55
@dashmage
Copy link
Contributor Author

dashmage commented Mar 23, 2024

I've also added an extra check in test_install_redfish_enabled_with_incorrect_credential to make sure that the exporter is installed when when there's wrong redfish credentials. This ensures that the issue being fixed will be caught by the modified unit test.

@dashmage dashmage changed the title fix: Check exporter installed before healthcheck fix: Remove redfish credential validation in install event handler Mar 23, 2024
@dashmage dashmage force-pushed the bseng-2197-check-exporter-installed-before-healthcheck branch from 6432d07 to 39da143 Compare March 23, 2024 10:04
@dashmage dashmage force-pushed the bseng-2197-check-exporter-installed-before-healthcheck branch from 39da143 to fd7a71f Compare March 24, 2024 17:27
Copy link

@Pjack Pjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code and logic looks good to me .

I would like to confirm where should we validate this behavior?

  1. config or credential is invalid
  2. User change the configuration or provide correct credential.
  3. Verify if the charm will become active state again.

Maybe in hardware-indendent functional test?

@dashmage
Copy link
Contributor Author

The code and logic looks good to me .

I would like to confirm where should we validate this behavior?

1. config or credential is invalid

2. User change the configuration or provide correct credential.

3. Verify if the charm will become active state again.

Maybe in hardware-indendent functional test?

It would probably be in the hardware dependent functional tests if the redfish creds are invalid and we want to test that after the correct config is provided, the charm is in active status. Currently we only have the test to check if we're able to query the redfish metrics.

If you're talking about any hardware independent config (exporter log level, port etc.), we already have functional tests for the behavior you've mentioned. But we do need to make some changes to verify that the charm is active afterwards. Currently it just verifies if the config is correctly changed in the underlying file.

It would be good to have another refactoring PR for the functional tests as well so we can follow through on these requirements.

Copy link
Member

@gabrielcocenza gabrielcocenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit lost with the PR title, description and the bug that points the fix.

The bug description says:

This is because the blocked status message for invalid redfish creds has been overwritten by the missing relation message while executing the update status hook.

I'm not seeing any changes in the update status hook.

What are we fixing by removing the validation of the exporter configs on the install hook?

@dashmage
Copy link
Contributor Author

dashmage commented Mar 25, 2024

I'm a little bit lost with the PR title, description and the bug that points the fix.

The bug description says:

This is because the blocked status message for invalid redfish creds has been overwritten by the missing relation message while executing the update status hook.

I'm not seeing any changes in the update status hook.

What are we fixing by removing the validation of the exporter configs on the install hook?

Sorry about the confusion. We made some findings over the course the PR due to which we had to change the originally intended fix. I've made some changes to the PR description that hopefully clarifies your doubts. If not, feel free to respond with any questions and I'll try my best to address them.

Copy link
Contributor

@chanchiwai-ray chanchiwai-ray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me for addressing the root cause

@dashmage dashmage merged commit 06b6815 into canonical:master Mar 26, 2024
5 checks passed
@dashmage dashmage deleted the bseng-2197-check-exporter-installed-before-healthcheck branch March 26, 2024 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Charm in error state after configuring redfish creds
4 participants