Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cancel in exception handling code #96

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion unliftio/src/UnliftIO/Internals/Async.hs
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,13 @@ asyncOnWithUnmask i m =
--
-- @since 0.1.0.0
withAsync :: MonadUnliftIO m => m a -> (Async a -> m b) -> m b
withAsync a b = withRunInIO $ \run -> A.withAsync (run a) (run . b)
withAsync a b = withRunInIO $ \run -> do
maskingState <- E.getMaskingState
case maskingState of
E.MaskedUninterruptible ->
A.withAsyncWithUnmask (\unmask -> unmask (run a)) (run . b)
_ ->
A.withAsync (run a) (run . b)
Comment on lines +78 to +84
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an alternative fix to using restore in the handler of withException and bracket.

The Async API is totally dependent on being able to throw async exceptions to the other thread. I don't think Async makes any sense if it cannot do that.

While this single fix works here, it won't work for eg race or any of the other code, so we may want to patch those as well.

I'm almost curious if this is a thing that needs to get raised with upstream Async.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented here simonmar/async#67

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be very worried about two things here:

  1. A very subtle change to behavior that can take production code down
  2. Diverging from async's handling of these cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream hasn't responded, so, we may want to handle this here?

I have a hard time imagining any code that depends on withAsync creating an unkillable zombie and being unable to exit.

This is a decently large footgun that is only really likely to happen if you are using unliftio or safe-exceptions. While I do think that async should really be launching threads in at most a MaskedInterruptible state, it's not super likely to be triggered - uninterruptibleMask is pretty rare to see, and has lots of warning labels.

Since unliftio uses it a bunch in cleanup functions, though, I think it does make sense to make the combination of UnliftIO.Exception and UnliftIO.Async not quite so dangerous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following why this situation is worse with unliftio or safe-exceptions, it seems like other code would have the same behavior.

I'm still very much on the fence here. I think the change is a good one, but the idiosyncrasies it may cause may not be worth it.

One specific question about the implementation here: IIUC, this is changing uninterruptible masking to unmasked, but leaving interruptible masking as it is. That seems surprising. Should uninterruptible become interruptible, or both become unmasked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following why this situation is worse with unliftio or safe-exceptions, it seems like other code would have the same behavior.

It's unlikely because most people read the docs for uninterruptibleMask and decide they probably don't need it. So the main way it becomes a part of people's programs is when safe-exceptions or unliftio style cleanups enter the picture and puts it in with bracket and friends.

But, yeah, anyone that runs uninterruptibleMask_ $ async $ do ... is going to create an unkillable Async.

I'm still very much on the fence here. I think the change is a good one, but the idiosyncrasies it may cause may not be worth it.

I'm coming around that it should probably be unmasked in the async and withAsync functions, and definitely unmasked in race and race-like functions. If the intent is "one of these threads should die," then unmasking should happen. But concurrently poses a slightly different question: Simon Marlow points out that finally foo (a >> b) and finally foo (concurrently a b should have the same semantics, but if concurrently unmasks, then the action is no longer protected. So the appropriate default for concurrently may be different than for race, since the expectations/assumptions are different.

Likewise, withAsync seems to demand the unmasking much more - uninterruptibleMask_ $ withAsync foo bar cannot terminate if foo is a forever loop - you're deadlocked. But uninterruptibleMask_ $ async foo is less likely to deadlock, since it's entirely possible that you never cancel it, and thus never have the problem.

I'm not really sure the best way to approach it. Maybe separate modules with different behavior is the right answer? A separate opt-in package?

My chaos brain says something like UnliftIO.Async needs to export these functions with a MaskingBehavior argument, forcing all users to upgrade, and maybe give a warning/deprecation message that points to alternative modules. But that's a big breaking change for probably little benefit for those that aren't already experiencing the pain.

One specific question about the implementation here: IIUC, this is changing uninterruptible masking to unmasked, but leaving interruptible masking as it is. That seems surprising. Should uninterruptible become interruptible, or both become unmasked?

Yeah, that's a good pont - unfortunately, we're missing a primop.

I have opened an upstream issue: ideally, we'd have a primop unsafeUnUninterruptibleMask# that would downgrade a UninterruptibleMask to a InterruptibleMask. But we don't have that, so the alternative of asyncWithUnmask $ \unmask -> unmask $ mask_ action actually is vulnerable to an async exception at the unmask point, since unsafeUnmask (the function provided as unmask) triggers any queued async exceptions in the C--. In the case of async, this is really unlikely - I think you'd need to cancel =<< async action to observe the async exception slipping between the unmask $ mask_.

Switching from mask_ (which checks masking state; should be unnecessary after unmask) to block would also accomplish this.


-- | Unlifted 'A.withAsyncBound'.
--
Expand Down
23 changes: 22 additions & 1 deletion unliftio/test/UnliftIO/ExceptionSpec.hs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
module UnliftIO.ExceptionSpec (spec) where

import qualified Control.Exception
import Control.Monad (void, (<=<))
import Control.Monad (void, (<=<), when)
import Data.Bifunctor (first)
import Test.Hspec
import UnliftIO
Expand Down Expand Up @@ -79,6 +79,27 @@ spec = do
result <- withWrappedAsyncExceptionThrown $ \m -> trySyncOrAsync (void m)
first fromExceptionUnwrap result `shouldBe` Left (Just Exception1)

describe "withException" $ do
it "should work when withAsync is in the handler" $ do
let
action =
error "oops"
`onException` do
let
timerAction n = do
threadDelay 1000000
when (n < 10) $ do
timerAction (n + 1)
withAsync (timerAction 0) $ \a -> do
cancel a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be replaced with pure (). withAsync is also unable to cancel the thread.

eresult <-
race
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This race call also is unable to cancel one of the threads - so, with the original code, this test hangs for 10 seconds waiting for timerAction to complete before returning Right 10.

(action `shouldThrow` errorCall "oops")
(do
threadDelay 1000000
pure 10)
eresult `shouldBe` Left ()

describe "fromExceptionUnwrap" $ do
it "should be the inverse of toAsyncException" $ do
fromExceptionUnwrap (toAsyncException Exception1) `shouldBe` Just Exception1
Expand Down