Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

many: add state unlocker to runinhibit helpers #14671

Merged

Conversation

ZeyadYasser
Copy link
Contributor

This avoids the deadlock scenario:

  • snap run holds inhibition lock.
  • snapd holds state lock (link-snap).
  • snapd tries to hold inhibition lock but waits for snap run to release lock.
  • snap run calls the REST API (i.e. for notices, connections) which tries to hold state lock.
  • snap run is now waiting for snapd to release state lock (to send the notice) and snapd is waiting for snap run to release the inibition lock.
  • we have a deadlock and snapd state is locked.

Unlocking snapd state using the new explicitly passed unlocker to runinhibit helpers avoids this deadlock.

Fixes: https://bugs.launchpad.net/snapd/+bug/2084730

This is an alternative approach to #14669

@github-actions github-actions bot added the Needs Documentation -auto- Label automatically added which indicates the change needs documentation label Oct 24, 2024
@ZeyadYasser ZeyadYasser added the Needs Samuele review Needs a review from Samuele before it can land label Oct 24, 2024
Copy link

codecov bot commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 86.58537% with 11 lines in your changes missing coverage. Please review.

Project coverage is 78.98%. Comparing base (96ea7b0) to head (f162216).
Report is 48 commits behind head on master.

Files with missing lines Patch % Lines
overlord/hookstate/hooks.go 50.00% 0 Missing and 5 partials ⚠️
overlord/snapstate/handlers.go 81.81% 2 Missing and 2 partials ⚠️
overlord/snapstate/backend/link.go 91.66% 0 Missing and 1 partial ⚠️
overlord/snapstate/backend/locking.go 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #14671      +/-   ##
==========================================
+ Coverage   78.95%   78.98%   +0.03%     
==========================================
  Files        1084     1085       +1     
  Lines      146638   147222     +584     
==========================================
+ Hits       115773   116280     +507     
- Misses      23667    23717      +50     
- Partials     7198     7225      +27     
Flag Coverage Δ
unittests 78.98% <86.58%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite simple.

Can we add a spread test where the hint file is held locked by the test?

@@ -132,7 +138,13 @@ func removeInhibitInfoFiles(snapName string) error {
// start and will block, presenting a user interface if possible. Also
// info.Previous corresponding to the snap revision that was installed must be
// provided and cannot be unset.
func LockWithHint(snapName string, hint Hint, info InhibitInfo) error {
func LockWithHint(snapName string, hint Hint, info InhibitInfo, unlocker Unlocker) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc comments need to be updated to mention unlocker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@ZeyadYasser ZeyadYasser added this to the 2.67 milestone Oct 28, 2024
@ZeyadYasser
Copy link
Contributor Author

Can we add a spread test where the hint file is held locked by the test?

Added, Thanks for pointing it out!

Copy link
Contributor

@MiguelPires MiguelPires left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. It's a bit of a leaky abstraction but for now it does seem better than doing it manually in snapstate/ which is likely more error-prone

# This refresh will block because it tries to hold the inhibition file lock for test-snapd-sh
snap refresh --no-wait --edge test-snapd-sh > locked-change-id

# Check that snapd state is not locked when trying to hold the inhibition file lock above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob this should do

# wait until snapd is blocked in link-snap
# avoid using the API directly to not take the state lock
locked_id="$(cat locked-change-id)"
retry -n 50 --wait 1 sh -c "snap debug state /var/lib/snapd/state.json $locked_id | MATCH 'Doing .* Make snap .* available to the system'"

# and still waiting
snap debug state /var/lib/snapd/state.json $locked_id | MATCH 'Doing .* Make snap .* available to the system'

not sure, but maybe it would be worth to confirm with lsof that snapd is holding the lock file open

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, also added the lsof check


# Wait for refresh to finish to be able to remove the snap
snap debug ensure-state-soon
retry -n 10 sh -c "snap changes | NOMATCH Doing"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work:

if [ -s locked-change-id ]; then
    snap wait $(cat locked-change-id)
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

snap debug ensure-state-soon
retry -n 10 sh -c "snap changes | NOMATCH Doing"

snap remove --purge test-snapd-sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup does it already

# This refresh will block because it tries to hold the inhibition file lock for test-snapd-sh
snap refresh --no-wait --edge test-snapd-sh > locked-change-id

# Check that snapd state is not locked when trying to hold the inhibition file lock above
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check that when a snapd is stuck on flock, operations changing the state of the snap (remove, or another refresh) fail the conflict check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, added those checks. thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, added checks installing a parallel instance

Copy link
Member

@Meulengracht Meulengracht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, FWIW I think the fix is a bit too elaborate but I think given how Inhibitors are done today its fine. The test comments from maciej would be great to have resolved

Copy link
Contributor

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It'd be good to ask @pedronis to have a look as well.

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions

overlord/hookstate/ctlcmd/refresh.go Show resolved Hide resolved
@@ -177,7 +177,7 @@ func (h *gateAutoRefreshHookHandler) Before() error {
defer lock.Unlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about the snap lock

Comment on lines 249 to 253
_, inhibitInfo, err := runinhibit.IsLocked(snapName, st.Unlocker())
if err != nil {
return err
}
if err := runinhibit.LockWithHint(snapName, runinhibit.HintInhibitedForRefresh, inhibitInfo); err != nil {
if err := runinhibit.LockWithHint(snapName, runinhibit.HintInhibitedForRefresh, inhibitInfo, st.Unlocker()); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to unlock/lock the state twice quickly, it feels like it would be better later to add a single operation that does this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks for catching this

overlord/snapstate/handlers.go Show resolved Hide resolved
The snap lock is only accessible to root and is only intended to synchronize
operations between snapd and snap-confine (and snap-update-ns in some cases).

Any process holding the snap lock must not do any interactions with snapd to
avoid deadlocks due to locked snap state.

Signed-off-by: Zeyad Gouda <[email protected]>
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

@ZeyadYasser
Copy link
Contributor Author

Following failing tests are known to fail:

  • google-distro-2:opensuse-tumbleweed-64:tests/main/document-portal-activation
  • google-distro-2:opensuse-tumbleweed-64:tests/main/prepare-image-validation-sets
  • google-distro-2:opensuse-tumbleweed-64:tests/main/sudo-env
  • google:ubuntu-18.04-64:tests/main/snap-seccomp-syscalls
  • google:ubuntu-24.10-64:tests/main/apparmor-prompting-integration-tests:write_single_deny
  • google-pro:ubuntu-fips-22.04-64:tests/fips/

@Meulengracht Meulengracht merged commit 2905448 into canonical:master Nov 12, 2024
54 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation -auto- Label automatically added which indicates the change needs documentation Needs Samuele review Needs a review from Samuele before it can land
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants