-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
many: add state unlocker to runinhibit helpers #14671
many: add state unlocker to runinhibit helpers #14671
Conversation
Signed-off-by: Zeyad Gouda <[email protected]>
Signed-off-by: Zeyad Gouda <[email protected]>
Signed-off-by: Zeyad Gouda <[email protected]>
Signed-off-by: Zeyad Gouda <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14671 +/- ##
==========================================
+ Coverage 78.95% 78.98% +0.03%
==========================================
Files 1084 1085 +1
Lines 146638 147222 +584
==========================================
+ Hits 115773 116280 +507
- Misses 23667 23717 +50
- Partials 7198 7225 +27
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite simple.
Can we add a spread test where the hint file is held locked by the test?
@@ -132,7 +138,13 @@ func removeInhibitInfoFiles(snapName string) error { | |||
// start and will block, presenting a user interface if possible. Also | |||
// info.Previous corresponding to the snap revision that was installed must be | |||
// provided and cannot be unset. | |||
func LockWithHint(snapName string, hint Hint, info InhibitInfo) error { | |||
func LockWithHint(snapName string, hint Hint, info InhibitInfo, unlocker Unlocker) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc comments need to be updated to mention unlocker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Signed-off-by: Zeyad Gouda <[email protected]>
Signed-off-by: Zeyad Gouda <[email protected]>
Added, Thanks for pointing it out! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. It's a bit of a leaky abstraction but for now it does seem better than doing it manually in snapstate/ which is likely more error-prone
# This refresh will block because it tries to hold the inhibition file lock for test-snapd-sh | ||
snap refresh --no-wait --edge test-snapd-sh > locked-change-id | ||
|
||
# Check that snapd state is not locked when trying to hold the inhibition file lock above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob this should do
# wait until snapd is blocked in link-snap
# avoid using the API directly to not take the state lock
locked_id="$(cat locked-change-id)"
retry -n 50 --wait 1 sh -c "snap debug state /var/lib/snapd/state.json $locked_id | MATCH 'Doing .* Make snap .* available to the system'"
# and still waiting
snap debug state /var/lib/snapd/state.json $locked_id | MATCH 'Doing .* Make snap .* available to the system'
not sure, but maybe it would be worth to confirm with lsof that snapd is holding the lock file open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, also added the lsof check
|
||
# Wait for refresh to finish to be able to remove the snap | ||
snap debug ensure-state-soon | ||
retry -n 10 sh -c "snap changes | NOMATCH Doing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work:
if [ -s locked-change-id ]; then
snap wait $(cat locked-change-id)
fi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
snap debug ensure-state-soon | ||
retry -n 10 sh -c "snap changes | NOMATCH Doing" | ||
|
||
snap remove --purge test-snapd-sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup does it already
# This refresh will block because it tries to hold the inhibition file lock for test-snapd-sh | ||
snap refresh --no-wait --edge test-snapd-sh > locked-change-id | ||
|
||
# Check that snapd state is not locked when trying to hold the inhibition file lock above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check that when a snapd is stuck on flock, operations changing the state of the snap (remove, or another refresh) fail the conflict check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, added those checks. thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, added checks installing a parallel instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, FWIW I think the fix is a bit too elaborate but I think given how Inhibitors are done today its fine. The test comments from maciej would be great to have resolved
Signed-off-by: Zeyad Gouda <[email protected]>
Signed-off-by: Zeyad Gouda <[email protected]>
Signed-off-by: Zeyad Gouda <[email protected]>
Signed-off-by: Zeyad Gouda <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It'd be good to ask @pedronis to have a look as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some questions
@@ -177,7 +177,7 @@ func (h *gateAutoRefreshHookHandler) Before() error { | |||
defer lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about the snap lock
overlord/hookstate/hooks.go
Outdated
_, inhibitInfo, err := runinhibit.IsLocked(snapName, st.Unlocker()) | ||
if err != nil { | ||
return err | ||
} | ||
if err := runinhibit.LockWithHint(snapName, runinhibit.HintInhibitedForRefresh, inhibitInfo); err != nil { | ||
if err := runinhibit.LockWithHint(snapName, runinhibit.HintInhibitedForRefresh, inhibitInfo, st.Unlocker()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to unlock/lock the state twice quickly, it feels like it would be better later to add a single operation that does this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks for catching this
The snap lock is only accessible to root and is only intended to synchronize operations between snapd and snap-confine (and snap-update-ns in some cases). Any process holding the snap lock must not do any interactions with snapd to avoid deadlocks due to locked snap state. Signed-off-by: Zeyad Gouda <[email protected]>
Signed-off-by: Zeyad Gouda <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
Following failing tests are known to fail:
|
This avoids the deadlock scenario:
Unlocking snapd state using the new explicitly passed unlocker to runinhibit helpers avoids this deadlock.
Fixes: https://bugs.launchpad.net/snapd/+bug/2084730
This is an alternative approach to #14669