Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fsFreeze field to VolumeSnapshot #477

Merged
merged 1 commit into from
May 23, 2024
Merged

Conversation

ejweber
Copy link
Contributor

@ejweber ejweber commented Apr 24, 2024

Which issue(s) this PR fixes:

longhorn/longhorn#2187

What this PR does / why we need it:

  • Proxy the freezeFS field for SnapshotVolume through to the engine.
  • Attempt to unfreeze all file systems on the root partitions of Longhorn volumes on startup.

Additional documentation or context

Depends on:

Need to fix imports before merging.

Copy link

mergify bot commented Apr 24, 2024

This pull request is now in conflict. Could you fix it @ejweber? 🙏

Copy link

mergify bot commented Apr 27, 2024

This pull request is now in conflict. Could you fix it @ejweber? 🙏

@PhanLe1010
Copy link
Contributor

  • Proxy the freezeFS field for SnapshotVolume through to the engine.
  • Attempt to unfreeze all file systems on the root partitions of Longhorn volumes on startup.

Is it correct that we are not going to periodically check and unfreeze the filesystem of a crashed engine like:

  • instance-manager pod is still running
  • engine crashed
  • engine move to new node
  • instance-manager of old node is not going to unfreeze the filesystem on old node

@ejweber
Copy link
Contributor Author

ejweber commented Apr 29, 2024

Is it correct that we are not going to periodically check and unfreeze the filesystem of a crashed engine like:

It is correct (unless there is pushback of course). This is my opinion on why:

  • The engine doesn't actually usually "crash". Usually it transitions to an error state because (for example) it lost all connections with a replica. Then, it is shut down. We do implement shutdown logic that will ensure it is unfrozen in this case.
  • If the engine literally does go away with any shut down logic (e.g. with a kill -9), it is usually automatically reattached by Longhorn. We do implement logic that will ensure it is unfrozen on startup.

In my opinion, the complexity of periodically scanning for and attempting to unfreeze file systems for engines that happen to somehow crash without going through any shutdown at the exact moment they are frozen introduces more risk than the potential for this situation to leave the file system frozen does. (Especially in the case where we have multiple instance managers running simultaneously, as still often happens for v1.)

Keep in mind that if it does happen, it is not nearly as serious as the other cases we investigated. With a look at the support bundle, we can determine the file system was frozen, but never unfrozen, and suggest an unfreeze command.

@PhanLe1010
Copy link
Contributor

Agree! Multiple instance-manager case introduces significant complications if we introduce the periodically check and unfreeze in IM

Copy link

mergify bot commented May 4, 2024

This pull request is now in conflict. Could you fix it @ejweber? 🙏

james-munson
james-munson previously approved these changes May 10, 2024
Copy link
Contributor

@james-munson james-munson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ejweber
Copy link
Contributor Author

ejweber commented May 15, 2024

I rebased this one and it is now ready (pending a successful CI build, of course). Please check it again when you have a moment @PhanLe1010 and @james-munson.

james-munson
james-munson previously approved these changes May 16, 2024
Copy link
Contributor

@james-munson james-munson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues.

PhanLe1010
PhanLe1010 previously approved these changes May 17, 2024
Copy link
Contributor

@PhanLe1010 PhanLe1010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

mergify bot commented May 17, 2024

This pull request is now in conflict. Could you fix it @ejweber? 🙏

shuo-wu
shuo-wu previously approved these changes May 21, 2024
Longhorn 2187

Signed-off-by: Eric Weber <[email protected]>
@ejweber
Copy link
Contributor Author

ejweber commented May 22, 2024

Hello @PhanLe1010, @shuo-wu, and @james-munson. I have resolved the conflict. Can I have an approve again from one of you so I can merge? Thanks!

@shuo-wu shuo-wu merged commit a4b542f into longhorn:master May 23, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants