Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the bug where VirtualMachine::shutdown function throw affects multipass delete #3625

Merged
merged 13 commits into from
Sep 18, 2024

Conversation

georgeliao
Copy link
Contributor

@georgeliao georgeliao commented Aug 5, 2024

close #3624
close #3606

A few things were done based on our team discussion

  1. skip_states check in daemon.cpp was removed because it is the responsibility of VirtualMachine::shutdown function to check the state.
  2. delegate the cmd suggestion part of the error message to CLI client. The daemon maps VMStateInvalidException exception to grpc::StatusCode::FAILED_PRECONDITION, and the CLI side interprets this code respectively.
  3. The force boolean input parameter of function VirtualMachine::shutdown is replaced by a 3 values enum ShutdownPolicy, corresponding refactor was performed as well.

Functional testing:
Please at least cover the following scenarios:

  1. multipass stop on a deleted or non-existing VM does not trigger the --force suggestion
  2. multipass stop on a VM with invalid states does trigger the --force suggestion. Invalid states include suspended, suspending, starting and restarting, the transtioning states require running multipass stop in 2nd cmd panel.
  3. multipass delete on a deleted or non-existing VM does not trigger the --purge suggestion
  4. multipass delete on a suspended and stopped vm can retain the state meaning the recovered vm should have the same state.
  5. multipass delete on arunning vm will shutdown the vm gracefully and recover it to the stopped state.
  6. multipass delete on a VM with invalid states does trigger the --purge suggestion. Invalid states are only transtioning states like suspending, starting and restarting, they all require the 2nd cmd panel to run.
  7. Run these two commands in bulk with mixed state of the vms and see if they behave right.
  8. Check whether GUI encounters these exceptions.

Note:

  1. The private side is not done yet, but it will be a simple adaptation due to the shutdown and check_state_for_shutdown interface change. I would rather waiting the public PR review finish and change that later.
  2. The unit tests on daemon::stop daemon::delete are lacking due to lacking frameworks which are similar to test_daemon_snapshot_restore.cpp. It is not a big problem to create the stop and delete version of this, but it might take more time.

@georgeliao georgeliao marked this pull request as draft August 5, 2024 09:18
Copy link

codecov bot commented Aug 5, 2024

Codecov Report

Attention: Patch coverage is 62.79070% with 16 lines in your changes missing coverage. Please review.

Project coverage is 88.92%. Comparing base (5f4dfe2) to head (7d01280).
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/daemon/daemon.cpp 6.25% 15 Missing ⚠️
.../platform/backends/shared/base_virtual_machine.cpp 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3625      +/-   ##
==========================================
+ Coverage   88.85%   88.92%   +0.06%     
==========================================
  Files         254      254              
  Lines       14269    14271       +2     
==========================================
+ Hits        12679    12690      +11     
+ Misses       1590     1581       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@georgeliao georgeliao marked this pull request as ready for review August 15, 2024 15:12
Copy link
Contributor

@sharder996 sharder996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, I don't think this is the proper way of fixing this. It's more of a hack than an actually fix. I think we need to go back to the question of "What is non-purge delete supposed to do?". Moving check_state_for_shutdown() to the daemon isn't the correct proper fix either in my opinion. Instead we should be evaluating what stop is supposed to do in the context of a delete of a suspended vm.

Secondly, it would be nice if the commit history got cleaned up a bit. Currently, only f34db7f is relevant to the final diff, so the rest of the commits can be dropped.

@georgeliao
Copy link
Contributor Author

Firstly, I don't think this is the proper way of fixing this. It's more of a hack than an actually fix. I think we need to go back to the question of "What is non-purge delete supposed to do?". Moving check_state_for_shutdown() to the daemon isn't the correct proper fix either in my opinion. Instead we should be evaluating what stop is supposed to do in the context of a delete of a suspended VM.

To answer the question "What is non-purge delete supposed to do?", let's delve into the expected behavior of each state.

Major states:

  1. Suspended: Shutdown should be skipped, and the VM is moved into deleted_instance. When the VM is recovered, the VM state is still suspended. That is what the current code does.
  2. Running: non-force shutdown is applied, the state changes to stopped and the VM is moved into deleted_instance. When the VM is recovered, the VM state is stopped. That is what the current code does.
  3. Stopped: Shutdown should be skipped, and the VM is moved into deleted_instance. When the VM is recovered, the VM state is still stopped. That is what the current code does.

So in summary, the catch exception and do nothing did the skip shutdown thing. In my opinion, this is a reasonable thing to do while the check_state_for_shutdown function is still in the void mp::VirtualMachine::shutdown(bool force) and it throws exception for both multipass stop and multipass delete. Feel free to share any other code design idea you might have.

There are other minor states like starting or restarting, these states on cmd blocks the daemon. The user will have to open another terminal to delete the starting VM. The ideal behavior would be waiting for the starting process finish and shut it down gracefully from the running state. However, I do not think we have good thread safety for this in the first place, so tackling these states would be beyond the scope of this PR.

@georgeliao
Copy link
Contributor Author

Secondly, it would be nice if the commit history got cleaned up a bit. Currently, only f34db7f is relevant to the final diff, so the rest of the commits can be dropped.

This can be done, sorry about that. Initially I was thinking about put two fixes into one PR.

@ricab
Copy link
Collaborator

ricab commented Aug 28, 2024

There are other minor states like starting or restarting, these states on cmd blocks the daemon. The user will have to open another terminal to delete the starting VM. The ideal behavior would be waiting for the starting process finish and shut it down gracefully from the running state. However, I do not think we have good thread safety for this in the first place, so tackling these states would be beyond the scope of this PR.

Hmm, I would not say that those states are out of scope, but I think we could simplify and not wait for a graceful shutdown either. Instead, multipass delete could still refuse to delete instances in those states (although with a different error message, not mentioning --force). The user could still use multipass delete --purge to force-shutdown and then delete. So, I think "suspended" is the only state where, to maintain behavior, we'd need to skip shutdown.

Here is an option to achieve that: mp::VirtualMachine::shutdown could receive an enum instead of a bool, to accept a third possible value: accepted_suspended (or a better name). So, on Daemon::delete_vm, we'd now call shutdown(purge ? OurEnum::force : OurEnum::accept_suspended). On stop, we'd call instead shutdown(force ? OurEnum::force : OurEnum::plain) or something of this sort.

@georgeliao
Copy link
Contributor Author

georgeliao commented Aug 28, 2024

@sharder996
There are two items in email message that were edited later maybe. Regardless, just to clarify them

  1. Do nothing: check_state_for_shutdown() needs to change to not throw an exception.
  2. Put the machine in a stopped state: do the same thing as force stop.
  1. The throw is intended for multipass stop -> shutdown(force) and now it affects multipass delete -> shutdown(purge) we can not just simply change that throw.
  2. I think doing nothing and recovering it back to suspended is more reasonable, force stop is the one can change from suspended to stop state.

@georgeliao georgeliao changed the title Fix bugs of force stop Fix the bug where VirtualMachine::shutdown function throws affect multipass delete Aug 29, 2024
@georgeliao georgeliao changed the title Fix the bug where VirtualMachine::shutdown function throws affect multipass delete Fix the bug where VirtualMachine::shutdown function throw affect multipass delete Aug 29, 2024
@georgeliao georgeliao changed the title Fix the bug where VirtualMachine::shutdown function throw affect multipass delete Fix the bug where VirtualMachine::shutdown function throw affects multipass delete Aug 29, 2024
@ricab
Copy link
Collaborator

ricab commented Aug 30, 2024

This just occurred to me and I thought I'd note it down. We need to remember to test what happens when we pass multiple instance arguments or use --all. What happens when a VM in the wrong state and we do stuff like:

  • multipass delete a b c
  • multipass delete --all
  • multipass delete --all --purge
  • multipass stop --force a b c
  • multipass stop --all
  • multipass stop --all --force

Do we do the right thing in all those cases?

@georgeliao georgeliao force-pushed the fix_bugs_of_force_stop branch 2 times, most recently from 67da547 to 1addda1 Compare September 2, 2024 10:51
@georgeliao
Copy link
Contributor Author

This just occurred to me and I thought I'd note it down. We need to remember to test what happens when we pass multiple instance arguments or use --all. What happens when a VM in the wrong state and we do stuff like:

multipass delete vm1 vm2 vm3
multipass delete --all
multipass delete --all --purge
multipass stop --force vm1 vm2 vm3
multipass stop --all
multipass stop --all --force
Do we do the right thing in all those cases?

I changed the name of the VM to vm1, vm2 and vm3 for eaiser illustration.

It is thoughtful that you mentioned the bulk operations. This boils down to how the command behaves when one of the VM throws during this operation. Based on this line of code, it should exit on the first throwing VM and report the error to the cli and to the user. For example, if vm1 is running, vm2 is suspended and v,3 is stopped, then multipass stop vm1 vm2 vm3 should stop vm1 and throw on vm2 and leave vm3 intact. See the below command line output.

georgel@georgel-XPS-15:~/projects/canonical_projects/multipass_build/bin$
./multipass list
Name                    State             IPv4             Image
vm1                     Running           10.74.98.248     Ubuntu 24.04 LTS
vm2                     Suspended         --               Ubuntu 24.04 LTS
vm3                     Stopped           --               Ubuntu 24.04 LTS
georgel@georgel-XPS-15:~/projects/canonical_projects/multipass_build/bin$
./multipass stop vm1 vm2 vm3

stop failed: Cannot shut down suspended instance vm2.
Use --force to power it off.
georgel@georgel-XPS-15:~/projects/canonical_projects/multipass_build/bin$
./multipass list
Name                    State             IPv4             Image
vm1                     Stopped           --               Ubuntu 24.04 LTS
vm2                     Suspended         --               Ubuntu 24.04 LTS
vm3                     Stopped           --               Ubuntu 24.04 LTS

Regarding these three lines,

multipass delete --all --purge
multipass stop --force vm1 vm2 vm3
multipass stop --all --force

They do not throw, so they can apply the operation to every VMs.

@ricab
Copy link
Collaborator

ricab commented Sep 2, 2024

Hi @georgeliao, thanks for clarifying. I did not quite understand your last comment though. Do you mean that those commands do not go through the line of code you linked to earlier?

In any case, I still think this should be manually tested, just to verify expectations.

@georgeliao
Copy link
Contributor Author

Hi @georgeliao, thanks for clarifying. I did not quite understand your last comment though. Do you mean that those commands do not go through the line of code you linked to earlier?

What I was trying to say is that things are simpler without throwing. The loop will not exit early. Every VM in that selection will get the operation applied or skipped. It is not that different from applying the operation to a single VM.

@ricab
Copy link
Collaborator

ricab commented Sep 2, 2024

In some cases it will still need to throw, right? Anyway, just trying to make sure the behavior is verified experimentally.

{
std::unique_lock<std::mutex> lock{state_mutex};

force_shutdown = force;
force_shutdown = (shutdown_policy == ShutdownPolicy::Poweroff);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharder996 I just noticed that force_shutdown value should be set to true later in order to truely pair up with this line, check_state_for_shutdown has a chance to throw and leaves force_shutdown to true without actually executing forceful shutdown. Something we overlooked in the silencing error PR. I think the right place to set it might be after line 370, mpl::log(mpl::Level::info, vm_name, "Killing process");. What do you think? That makes

            mpl::log(mpl::Level::info, vm_name, "Killing process");
            force_shutdown = true;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

@georgeliao
Copy link
Contributor Author

georgeliao commented Sep 3, 2024

@ricab @sharder996 , the follow-up codework is done based on our team discussion, so it is ready for another round of review. Please read the intro description first to get an overview of the steps and the functional testing cases.

@ricab ricab added this to the 1.14.1 milestone Sep 6, 2024
sharder996
sharder996 previously approved these changes Sep 13, 2024
Copy link
Contributor

@sharder996 sharder996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the work in to make this better, Jia!

{
std::unique_lock<std::mutex> lock{state_mutex};

force_shutdown = force;
force_shutdown = (shutdown_policy == ShutdownPolicy::Poweroff);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

ricab
ricab previously approved these changes Sep 16, 2024
Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review and LGTM, thanks Jia!

Leaving it with @sharder996.

@sharder996
Copy link
Contributor

@georgeliao Just needs the private side now 😄

sharder996
sharder996 previously approved these changes Sep 17, 2024
@sharder996 sharder996 added this pull request to the merge queue Sep 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 17, 2024
@georgeliao georgeliao dismissed stale reviews from sharder996 and ricab via dcf6b7b September 18, 2024 12:35
Copy link
Collaborator

@ricab ricab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Jia!

There is a commit message that I find confusing, but I am not going to block on that. Merge away!

@ricab ricab merged commit 0189211 into main Sep 18, 2024
12 of 14 checks passed
@ricab ricab deleted the fix_bugs_of_force_stop branch September 18, 2024 19:10
ricab added a commit that referenced this pull request Sep 27, 2024
Fix the bug where VirtualMachine::shutdown function throw affects multipass delete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants