-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supply blocks may reorder flow of execution #364
Comments
In rakudo/rakudo#5158 I tried replacing the continuation based solution strategy with a much simpler approach of just always executing whenever tapping after the supply block itself has run. That behavior is a lot easier to understand. It does contradict the syntax though. Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed. Only executing them after having processed the supply block is thus highly counterintuitive. Additionally it makes code like the following impossible to write:
In that piece of code a I thus consider rakudo/rakudo#5158 a non-solution. At the moment I have run out of ideas of how to solve this. |
A The body of a Thus, the behavior in that example seems as intended.
Yes, it's important for correctness that the subscription is established right away.
The core of the issue really is that I picked the continuation juggling approach to resolve the conflicts over the alternative of scheduling the continuations using the normal thread pool scheduler and letting the standard |
Thanks for your input!
That question was meant as the heading for the following paragraph. But thanks for explaining anyways! ;-)
Which example are you refering to? The one in the first post does trigger the await magic and manages to change the order in which the supply block and tapping are executed.
I agree.
I can try implementing that. So that I get it right: I keep the BAWA (BlockAddWheneverAwaiter), but instead of putting the continuations in a list for later processing, I'd just instantly schedule the continuations on the ThreadPoolScheduler (not on the outer
I guess that depends on whether we want to be pedantic. If the aspiration is to never have surprising order reversals, then queuing things on the thread pool isn't a step forward. I am confident though that having things in the thread pool will solve rakudo/rakudo#5141 (the issue that made me explore the supply / whenever implementation in the first place). In that deadlock situation a continuation earlier in the list blocks on a lock a later continuation holds. But there is no "A waits on B, B waits on A" situation. So if the other continuations would be allowed to run, things would work out. It still feels wrong to give locks not being involved in the supply setup the same treatment. I think I'll start another experiment: I'll remove the entirety of the BlockAddWheneverAwaiter and see if the queue-on-recursion logic implemented in rakudo/rakudo@5478392 suffices to prevent deadlocks on construction. I think I have not yet seen a situation where a lock we need to resolve during construction has not been a recursion lock. |
Just removing the BlockAddWheneverAwaiter logic basically means having recursing emits during supply construction instantly return and thus killing back-pressure. All tests and spectests pass except for a single test:
The supply is |
I have now also tried making the BlockAddWheneverAwaiter queue its continuations on the thread pool. That gets I have now pushed two branches to GitHub:
|
Also to note, both branches solve the dead-lock problem in rakudo/rakudo#5141 |
I'll see if I can get the put-continuations-on-thread-pool solution working (i. e. get the hangs to pass). In general I like the solution removing the BlockAddWheneverAwaiter better. I believe its easier to understand what's going on as only locks of the supplies themself are affected. Also there are no continuations involved. @jnthn What do you think? |
I found the reason for the hangs with the schedule-continuations-on-the-thread-pool approach. The hanging tests replace The docs say the |
With the schedule-continuations-on-the-thread-pool approach there is also a test failure in
Cause: Emitting on a
My conclusion from the above analysis is, that the schedule-continuations-on-the-thread-pool approach as I implemented it brings its own set of problems. I don't think these issues can be alleviated. So the only approach I still have hope in is the remove-the-BlockAddWheneverAwaiter-completely approach I also explored. See this comment for the issue surrounding that approach. Unless other ideas of approaching this come up, I think the solution space has been exhaustively explored. I'm now dependent on feedback whether we should go forward with one of the two explored approaches. |
I'd like to make sure the state of this issue is clear. jnthn has explained that we have to live with occasional reordering. That's known, intentional and we can't do much about it. So the only problem left to be solved is the deadlock described in rakudo/rakudo#5141. I explored two approaches trying to solve the deadlock issue:
So to be able to continue working on this issue I'd like to either have:
So with the above summary I now ask for feedback. Ping @jnthn, @vrurg, @niner |
I'm not much into supplies from their implementation side, so not much help on this. But removal of a non-obvious deadlock cause does sound good to me. What's not good is that the spectest mentioned can barely be changed without breaking backward compatibility. Thus I wonder if the option 1, would it be accepted eventually, can be a 6.e change? In which case the spectest would still define 6.c/d behavior and we'd have it different for 6.e. Though it may also depend on the degree of change required for the spec to pass. |
I only partly agree.
The test strongly relies on the supplies being processed round robin, where each supply only gets to emit a single value each round. Supplies never gave any guarantee along those lines. That this works out on the current implementation is because of a delicate implementation detail. The test looks as if So I'm confident to say that there is a mismatch between test and implementation. One of the two must be wrong. I do agree though that the behavior of my proposed changes is sub optimal in this case (processing all values and the |
An alternative reading is, that the spec really does require a round-robin
processing by Supply.zip and the current implementation of this method is just
very fragile and relies on current implementation details of other parts of
Supply. So as part of the fix, method zip could be adjusted to deal with the
new situation.
Maybe by creating a local queue for each of the child supplies and taking one
element at a time out of this queue and asking the child supply for more if
the queue is empty? This way it wouldn't matter if the child supply emits one
or multiple values.
|
In Raku/problem-solving#364 a proposed solution to a deadlock in supply setup is hindered by the failure of a spectest for Supply.zip. Thus far we defined Supply.zip as having its reuslt Supply be `done` as soon as any input is `done`. This makes its behavior highly sensitive to the timing of its input values; as a reuslt, it may not emit all of the complete tuples of values that it feasibly could. This change introduces a "watermark", which is the maximum number of tuples we could possibly emit. When any input `supply` becomes done, we take the number of messages that were delivered, and use that as the maximum possible number of tuples that we can emit. Any further `done`s might lower that watermark. The `supply` block as a whole is done when all input supplies reach the watermark. This passes all current spectests. Hopefully, it will also pass when used in combination with the deadlock fix.
If I understand option 1 correctly, this is simply saying that it is left to So long as we process one message at a time in a given
That's a reasonable reading in my opinion. I did a draft PR rakudo/rakudo#5200 that explores an alternative way to implement |
In Raku/problem-solving#364 a proposed solution to a deadlock in supply setup is hindered by the failure of a spectest for Supply.zip. Thus far we defined Supply.zip as having its reuslt Supply be `done` as soon as any input is `done`. This makes its behavior highly sensitive to the timing of its input values; as a reuslt, it may not emit all of the complete tuples of values that it feasibly could. This change introduces a "watermark", which is the maximum number of tuples we could possibly emit. When any input `supply` becomes done, we take the number of messages that were delivered, and use that as the maximum possible number of tuples that we can emit. Any further `done`s might lower that watermark. The `supply` block as a whole is done when all input supplies reach the watermark. This passes all current spectests. Hopefully, it will also pass when used in combination with the deadlock fix.
Yes, that's exactly what happens.
From all I know that invariant is not broken. Each supply block is still guarded by a lock and no code path enters that supply block without requesting that lock.
I have requested a change on that draft PR. With that change |
In Raku/problem-solving#364 a proposed solution to a deadlock in supply setup is hindered by the failure of a spectest for Supply.zip. Thus far we defined Supply.zip as having its reuslt Supply be `done` as soon as any input is `done`. This makes its behavior highly sensitive to the timing of its input values; as a reuslt, it may not emit all of the complete tuples of values that it feasibly could. This change introduces a "watermark", which is the maximum number of tuples we could possibly emit. When any input `supply` becomes done, we take the number of messages that were delivered, and use that as the maximum possible number of tuples that we can emit. Any further `done`s might lower that watermark. The `supply` block as a whole is done when all input supplies reach the watermark. This passes all current spectests. Hopefully, it will also pass when used in combination with the deadlock fix.
I now consider rakudo/rakudo#5202 ready for review. That PR passes The PR contains one controversial change in behavior: This issue and the PR are now blocking on feedback / approval. @jnthn I think you have the deepest insight in that area of Raku. Thus I'd like to have your approval before going forward with my changes. |
I've merged the proposed fix and respective test in rakudo/rakudo#5202 and Raku/roast#833 |
Now that the changes have been merged, what do we do with this issue? Do I need to condense all the comments into some prose and do a problem-solving PR? |
This is a follow up of
rakudo/rakudo#5141
and
rakudo/rakudo#5158
Consider the following piece of code:
That piece of code prints
This is unexpected. The order seems reversed. Why is this so?
By default whenevers tap their supply instantly and as a result run the supply block they are connected to. This has a potential for deadlocking when that supply block (or one nested via more
whenevers
) contains anemit
as theemit
will call the whenever block of the original supply, but by definition only a single execution branch is allowed in a supply block at any time.Possibly (I'm not sure about that) there are other scenarios posing similar deadlock issues without recursion being involved.
The currently implemented solution for this deadlock problem in rakudo/rakudo@26a9c31 and rakudo/rakudo@5478392 works as follows. When - in the setup phase, during the processing of a whenever tapping - an await happens (be it a protect, acquire, lock or plain await) a continuation starting at the root of the whenever tapping is taken. Only after the supply block itself finished running the continuation is resumed. So the order in which the code is executed is dependent on whether there is any locking going on in code tapped by the whenever.
This behavior is difficult for a unknowing programmer to follow. Even a programmer knowing about this behavior will potentially have a hard time knowing whether there is a lock somewhere in the tapped code that could cause this reordering of code. This is a case of action-at-a-distance.
Also there still is a possibility of the code deadlocking left. That's detailed in rakudo/rakudo#5141. I consider that a problem separate from this issue.
The text was updated successfully, but these errors were encountered: