Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supply blocks may reorder flow of execution #364

Open
patrickbkr opened this issue Jan 27, 2023 · 18 comments
Open

Supply blocks may reorder flow of execution #364

patrickbkr opened this issue Jan 27, 2023 · 18 comments
Labels
language Changes to the Raku Programming Language

Comments

@patrickbkr
Copy link
Member

This is a follow up of
rakudo/rakudo#5141
and
rakudo/rakudo#5158

Consider the following piece of code:

my $p = Promise.new;
my $s1 = supply {
    await $p;
    say "processing whenever"
};
react {
    whenever $s1 { }
    $p.keep;
    say "end of react block";
}

That piece of code prints

end of react block
processing whenever

This is unexpected. The order seems reversed. Why is this so?

By default whenevers tap their supply instantly and as a result run the supply block they are connected to. This has a potential for deadlocking when that supply block (or one nested via more whenevers) contains an emit as the emit will call the whenever block of the original supply, but by definition only a single execution branch is allowed in a supply block at any time.
Possibly (I'm not sure about that) there are other scenarios posing similar deadlock issues without recursion being involved.

The currently implemented solution for this deadlock problem in rakudo/rakudo@26a9c31 and rakudo/rakudo@5478392 works as follows. When - in the setup phase, during the processing of a whenever tapping - an await happens (be it a protect, acquire, lock or plain await) a continuation starting at the root of the whenever tapping is taken. Only after the supply block itself finished running the continuation is resumed. So the order in which the code is executed is dependent on whether there is any locking going on in code tapped by the whenever.

This behavior is difficult for a unknowing programmer to follow. Even a programmer knowing about this behavior will potentially have a hard time knowing whether there is a lock somewhere in the tapped code that could cause this reordering of code. This is a case of action-at-a-distance.

Also there still is a possibility of the code deadlocking left. That's detailed in rakudo/rakudo#5141. I consider that a problem separate from this issue.

@patrickbkr patrickbkr added the language Changes to the Raku Programming Language label Jan 27, 2023
@patrickbkr
Copy link
Member Author

In rakudo/rakudo#5158 I tried replacing the continuation based solution strategy with a much simpler approach of just always executing whenever tapping after the supply block itself has run. That behavior is a lot easier to understand.

It does contradict the syntax though. Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed. Only executing them after having processed the supply block is thus highly counterintuitive.

Additionally it makes code like the following impossible to write:

# Simplified from t/spec/S17-promise/nonblocking-await.t
my $started = Promise.new;
my $ok = start react {
    whenever IO::Socket::Async.listen($s-address, $port) -> $conn {
        whenever $conn.Supply(:bin) -> $buf {
            await $conn.write: $buf;
            $conn.close;
        }
    }
    $started.keep;
}

In that piece of code a Promise is used to track the set up of the react block. When delaying whenever tapping after the processing of the react block, then that $started.keep happens too early.

I thus consider rakudo/rakudo#5158 a non-solution. At the moment I have run out of ideas of how to solve this.

@jnthn
Copy link
Contributor

jnthn commented Jan 30, 2023

This is unexpected. The order seems reversed. Why is this so?

A supply block (and thus react) provides concurrency control by enforcing one-message-at-a-time semantics: that is, we must completely process a message (reach the end of the whenever block) before we process the next one. An await inside of a whenever block thus prevents execution of any other whenever block starting until the awaited event completes. (If one wants more concurrency, the answer is to instead use a nested whenever; were await to have different semantics then there'd not be a means to achieve back pressure).

The body of a supply or react block itself is much like an OO constructor: just as we expect the constructor to complete before we have a complete object ready to receive method calls, we expect the setup phase of the supply or react block to run to completion before it processes any other messages.

Thus, the behavior in that example seems as intended.

Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed.

Yes, it's important for correctness that the subscription is established right away.

The currently implemented solution for this deadlock problem in rakudo/rakudo@26a9c31 and rakudo/rakudo@5478392 works as follows. When - in the setup phase, during the processing of a whenever tapping - an await happens (be it a protect, acquire, lock or plain await) a continuation starting at the root of the whenever tapping is taken. Only after the supply block itself finished running the continuation is resumed. So the order in which the code is executed is dependent on whether there is any locking going on in code tapped by the whenever.

This behavior is difficult for a unknowing programmer to follow. Even a programmer knowing about this behavior will potentially have a hard time knowing whether there is a lock somewhere in the tapped code that could cause this reordering of code. This is a case of action-at-a-distance.

The core of the issue really is that supply and react are primarily a tool for managing existing concurrency, rather than as a means to introduce concurrency. However, in an attempt to DWIM, we try to permit a supply to immediately emit in its body, rather than just subscribe to other underlying sources and emit based upon events from those. The BlockAddWheneverAwaiter approach, as noted in rakudo/rakudo#5141, seems to have shortcomings when one mixes supply blocks with use of a Supplier, for example.

I picked the continuation juggling approach to resolve the conflicts over the alternative of scheduling the continuations using the normal thread pool scheduler and letting the standard await mechanism take over, primarily on the basis that if supply blocks really were going to have to get into the business of introducing concurrency, then at least they should avoiding introducing threading too. Perhaps, however, the latter would have been the lesser evil; I'm curious if that would solve the problem at hand here and be more robust generally.

@patrickbkr
Copy link
Member Author

Thanks for your input!

This is unexpected. The order seems reversed. Why is this so?

A supply block (and thus react) provides concurrency control by enforcing one-message-at-a-time semantics: that is, we must completely process a message (reach the end of the whenever block) before we process the next one. An await inside of a whenever block thus prevents execution of any other whenever block starting until the awaited event completes. (If one wants more concurrency, the answer is to instead use a nested whenever; were await to have different semantics then there'd not be a means to achieve back pressure).

That question was meant as the heading for the following paragraph. But thanks for explaining anyways! ;-)

Thus, the behavior in that example seems as intended.

Which example are you refering to? The one in the first post does trigger the await magic and manages to change the order in which the supply block and tapping are executed.

Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed.

Yes, it's important for correctness that the subscription is established right away.

I agree.

The core of the issue really is that supply and react are primarily a tool for managing existing concurrency, rather than as a means to introduce concurrency. However, in an attempt to DWIM, we try to permit a supply to immediately emit in its body, rather than just subscribe to other underlying sources and emit based upon events from those. The BlockAddWheneverAwaiter approach, as noted in rakudo/rakudo#5141, seems to have shortcomings when one mixes supply blocks with use of a Supplier, for example.

I picked the continuation juggling approach to resolve the conflicts over the alternative of scheduling the continuations using the normal thread pool scheduler and letting the standard await mechanism take over, primarily on the basis that if supply blocks really were going to have to get into the business of introducing concurrency, then at least they should avoiding introducing threading too. Perhaps, however, the latter would have been the lesser evil

I can try implementing that. So that I get it right: I keep the BAWA (BlockAddWheneverAwaiter), but instead of putting the continuations in a list for later processing, I'd just instantly schedule the continuations on the ThreadPoolScheduler (not on the outer $*AWAITER, which would be a BlockingAwaiter and thus block the BAWA itself).

I'm curious if that would solve the problem at hand here and be more robust generally.

I guess that depends on whether we want to be pedantic. If the aspiration is to never have surprising order reversals, then queuing things on the thread pool isn't a step forward. I am confident though that having things in the thread pool will solve rakudo/rakudo#5141 (the issue that made me explore the supply / whenever implementation in the first place). In that deadlock situation a continuation earlier in the list blocks on a lock a later continuation holds. But there is no "A waits on B, B waits on A" situation. So if the other continuations would be allowed to run, things would work out.

It still feels wrong to give locks not being involved in the supply setup the same treatment. I think I'll start another experiment: I'll remove the entirety of the BlockAddWheneverAwaiter and see if the queue-on-recursion logic implemented in rakudo/rakudo@5478392 suffices to prevent deadlocks on construction. I think I have not yet seen a situation where a lock we need to resolve during construction has not been a recursion lock.

@patrickbkr
Copy link
Member Author

Just removing the BlockAddWheneverAwaiter logic basically means having recursing emits during supply construction instantly return and thus killing back-pressure. All tests and spectests pass except for a single test:

tap-ok Supply.zip(
    Supply.from-list("a".."e"),
    Supply.from-list("f".."k"),
    Supply.from-list("l".."p")
  ),
  [<a f l>,<b g m>,<c h n>,<d i o>,<e j p>],
  "zipping with 3 supplies works";

The supply is done before a single value is emitted. That happens, because the first supply gets to emit all its values and done (each being put into the protect-or-queue-on-recursion queue) before the others. That queue is then processed in order causing the zip to be done as soon as all the messages of the first supply have been processed.
Technically it still works as documented (The docs state: "The resulting supply is done as soon as any of the given supplies are done."), but I'm unsure if that behavior is acceptable.

@patrickbkr
Copy link
Member Author

I have now also tried making the BlockAddWheneverAwaiter queue its continuations on the thread pool. That gets make test pass and make spectest hang in S17-supply/[flat|batch|unique|merge|zip].t, but succeed otherwise.

I have now pushed two branches to GitHub:

@patrickbkr
Copy link
Member Author

Also to note, both branches solve the dead-lock problem in rakudo/rakudo#5141

@patrickbkr
Copy link
Member Author

I'll see if I can get the put-continuations-on-thread-pool solution working (i. e. get the hangs to pass).

In general I like the solution removing the BlockAddWheneverAwaiter better. I believe its easier to understand what's going on as only locks of the supplies themself are affected. Also there are no continuations involved.

@jnthn What do you think?

@patrickbkr
Copy link
Member Author

I found the reason for the hangs with the schedule-continuations-on-the-thread-pool approach. The hanging tests replace $*SCHEDULER with CurrentThreadScheduler which purposely blocks on calls to cue. Since we rely on the scheduler to have our continuations executed concurrently to prevent deadlocks the code can't resolve locks with a CurrentThreadScheduler.

The docs say the CurrentThreadScheduler is useful mostly for testing. So what's the point of having code run with the CurrentThreadScheduler? Is it forbidden to rely on the scheduler to resolve locks?

@patrickbkr
Copy link
Member Author

With the schedule-continuations-on-the-thread-pool approach there is also a test failure in t/spec/S17-supply/flat.t

my $s = Supplier.new;
my $f = $s.Supply.flat;
my $seen1 = [];
my $t1 = $f.tap( { $seen1.push: $_ } );
$s.emit([1,2]);
is $seen1, [1,2], 'did we get the first emit (1)';

Cause: Emitting on a .flated supply involves setting up a whenever and processing values in src/core.c/Supply-factories.pm6:460. That in turn activates the BlockAddWheneverAwaiter which causes the value to be emitted asynchronously, causing the $s.emit and is in the test to be racy.

Perhaps, however, the latter would have been the lesser evil; I'm curious if that would solve the problem at hand here and be more robust generally.

My conclusion from the above analysis is, that the schedule-continuations-on-the-thread-pool approach as I implemented it brings its own set of problems. I don't think these issues can be alleviated.

So the only approach I still have hope in is the remove-the-BlockAddWheneverAwaiter-completely approach I also explored. See this comment for the issue surrounding that approach.

Unless other ideas of approaching this come up, I think the solution space has been exhaustively explored. I'm now dependent on feedback whether we should go forward with one of the two explored approaches.

@patrickbkr
Copy link
Member Author

patrickbkr commented Feb 9, 2023

I'd like to make sure the state of this issue is clear.

jnthn has explained that we have to live with occasional reordering. That's known, intentional and we can't do much about it.

So the only problem left to be solved is the deadlock described in rakudo/rakudo#5141. I explored two approaches trying to solve the deadlock issue:

  1. The remove-the-BlockAddWheneverAwaiter-completely approach. This solves the deadlock, but has the effect that recursing emits during whenever setup do not block anymore but are queued. This causes one spec test failure in a test that turned out to depend on non-deterministic behavior. A positive side-effect of this approach is that locks in general do not cause reordering anymore, only recursion that would otherwise deadlock causes reordering. In my opinion this makes it easier to understand what's going on.
  2. The schedule-continuations-on-the-thread-pool approach. This solves the deadlock as well, but it introduces concurrency. One spec test failure made clear that this concurrency is problematic. I think this renders this approach unsuitable.

So to be able to continue working on this issue I'd like to either have:

  1. Some feedback if the remove-the-BlockAddWheneverAwaiter-completely approach has feasability. If yes I could adapt the failing Supply.zip spec test and we could do a Blin run to see if there is more fallout.
  2. Some input on how else I could approach this problem.

So with the above summary I now ask for feedback. Ping @jnthn, @vrurg, @niner

@vrurg
Copy link
Contributor

vrurg commented Feb 10, 2023

I'm not much into supplies from their implementation side, so not much help on this. But removal of a non-obvious deadlock cause does sound good to me. What's not good is that the spectest mentioned can barely be changed without breaking backward compatibility. Thus I wonder if the option 1, would it be accepted eventually, can be a 6.e change? In which case the spectest would still define 6.c/d behavior and we'd have it different for 6.e.

Though it may also depend on the degree of change required for the spec to pass.

@patrickbkr
Copy link
Member Author

What's not good is that the spec test mentioned can barely be changed without breaking backward compatibility.

I only partly agree.

tap-ok Supply.zip(
    Supply.from-list("a".."e"),
    Supply.from-list("f".."k"),
    Supply.from-list("l".."p")
  ),
  [<a f l>,<b g m>,<c h n>,<d i o>,<e j p>],
  "zipping with 3 supplies works";

The test strongly relies on the supplies being processed round robin, where each supply only gets to emit a single value each round. Supplies never gave any guarantee along those lines. That this works out on the current implementation is because of a delicate implementation detail.

The test looks as if zip is meant to process incoming values as long as at least one supply is still active. But that's different from the current implementation and the documentation.

So I'm confident to say that there is a mismatch between test and implementation. One of the two must be wrong.

I do agree though that the behavior of my proposed changes is sub optimal in this case (processing all values and the done from the first supply before working on the second).

@niner
Copy link

niner commented Feb 11, 2023 via email

jnthn added a commit to rakudo/rakudo that referenced this issue Feb 11, 2023
In Raku/problem-solving#364 a proposed
solution to a deadlock in supply setup is hindered by the failure of
a spectest for Supply.zip.

Thus far we defined Supply.zip as having its reuslt Supply be `done` as
soon as any input is `done`. This makes its behavior highly sensitive to
the timing of its input values; as a reuslt, it may not emit all of the
complete tuples of values that it feasibly could.

This change introduces a "watermark", which is the maximum number of
tuples we could possibly emit. When any input `supply` becomes done, we
take the number of messages that were delivered, and use that as the
maximum possible number of tuples that we can emit. Any further `done`s
might lower that watermark. The `supply` block as a whole is done when
all input supplies reach the watermark.

This passes all current spectests. Hopefully, it will also pass when
used in combination with the deadlock fix.
@jnthn
Copy link
Contributor

jnthn commented Feb 11, 2023

  1. The remove-the-BlockAddWheneverAwaiter-completely approach. This solves the deadlock, but has the effect that recursing emits during whenever setup do not block anymore but are queued. This causes one spec test failure in a test that turned out to Supply blocks may reorder flow of execution #364 (comment). A positive side-effect of this approach is that locks in general do not cause reordering anymore, only recursion that would otherwise deadlock causes reordering. In my opinion this makes it easier to understand what's going on.

  2. The schedule-continuations-on-the-thread-pool approach. This solves the deadlock as well, but it introduces concurrency. Supply blocks may reorder flow of execution #364 (comment) made clear that this concurrency is problematic. I think this renders this approach unsuitable.

If I understand option 1 correctly, this is simply saying that it is left to Lock::Async to do the queueing? In that case I think it's effectively achieving what 2 would, but by far more elegant and well-exercised means (e.g. we don't have to build anything new).

So long as we process one message at a time in a given supply and react instance (where the running of the supply or react blody block is considered a message too), then it should be fine.

An alternative reading is, that the spec really does require a round-robin
processing by Supply.zip and the current implementation of this method is just
very fragile and relies on current implementation details of other parts of
Supply. So as part of the fix, method zip could be adjusted to deal with the
new situation.

That's a reasonable reading in my opinion. I did a draft PR rakudo/rakudo#5200 that explores an alternative way to implement Supply.zip that passes all current spectests. It would be interesting to try it in combination with the deadlock fix; I think it should help.

patrickbkr pushed a commit to patrickbkr/rakudo that referenced this issue Feb 11, 2023
In Raku/problem-solving#364 a proposed
solution to a deadlock in supply setup is hindered by the failure of
a spectest for Supply.zip.

Thus far we defined Supply.zip as having its reuslt Supply be `done` as
soon as any input is `done`. This makes its behavior highly sensitive to
the timing of its input values; as a reuslt, it may not emit all of the
complete tuples of values that it feasibly could.

This change introduces a "watermark", which is the maximum number of
tuples we could possibly emit. When any input `supply` becomes done, we
take the number of messages that were delivered, and use that as the
maximum possible number of tuples that we can emit. Any further `done`s
might lower that watermark. The `supply` block as a whole is done when
all input supplies reach the watermark.

This passes all current spectests. Hopefully, it will also pass when
used in combination with the deadlock fix.
@patrickbkr
Copy link
Member Author

If I understand option 1 correctly, this is simply saying that it is left to Lock::Async to do the queueing?

Yes, that's exactly what happens.

So long as we process one message at a time in a given supply and react instance (where the running of the supply or react blody block is considered a message too), then it should be fine.

From all I know that invariant is not broken. Each supply block is still guarded by a lock and no code path enters that supply block without requesting that lock.

I did a draft PR rakudo/rakudo#5200 that explores an alternative way to implement Supply.zip that passes all current spectests. It would be interesting to try it in combination with the deadlock fix; I think it should help.

I have requested a change on that draft PR. With that change make test and make spectest both succeeded. I have now created a PR with those changes in rakudo/rakudo#5202.

patrickbkr pushed a commit to patrickbkr/rakudo that referenced this issue Feb 21, 2023
In Raku/problem-solving#364 a proposed
solution to a deadlock in supply setup is hindered by the failure of
a spectest for Supply.zip.

Thus far we defined Supply.zip as having its reuslt Supply be `done` as
soon as any input is `done`. This makes its behavior highly sensitive to
the timing of its input values; as a reuslt, it may not emit all of the
complete tuples of values that it feasibly could.

This change introduces a "watermark", which is the maximum number of
tuples we could possibly emit. When any input `supply` becomes done, we
take the number of messages that were delivered, and use that as the
maximum possible number of tuples that we can emit. Any further `done`s
might lower that watermark. The `supply` block as a whole is done when
all input supplies reach the watermark.

This passes all current spectests. Hopefully, it will also pass when
used in combination with the deadlock fix.
@patrickbkr
Copy link
Member Author

I now consider rakudo/rakudo#5202 ready for review. That PR passes make test, make spectest and the original deadlock in rakudo/rakudo#5141. I have turned that deadlocking code into a roast test in Raku/roast#833.

The PR contains one controversial change in behavior:
In patrickbkr/rakudo@5478392 the recursion breaking logic and Lock::Async::protect-or-queue-on-recursion was originally implemented. The commit message states: "A holder's recursion competes fairly with outside messages, thanks to the queueing being through Lock::Async." This is not true anymore, as the queued code objects are now synchronously called and not put on the thread pool anymore.

This issue and the PR are now blocking on feedback / approval. @jnthn I think you have the deepest insight in that area of Raku. Thus I'd like to have your approval before going forward with my changes.

@patrickbkr
Copy link
Member Author

I've merged the proposed fix and respective test in rakudo/rakudo#5202 and Raku/roast#833
We now have a full month time to find any regressions.

@patrickbkr
Copy link
Member Author

Now that the changes have been merged, what do we do with this issue? Do I need to condense all the comments into some prose and do a problem-solving PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language Changes to the Raku Programming Language
Projects
None yet
Development

No branches or pull requests

4 participants