Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when using manifold.time scheduling in application code #148

Open
gsnewmark opened this issue Nov 9, 2017 · 6 comments
Open

Deadlock when using manifold.time scheduling in application code #148

gsnewmark opened this issue Nov 9, 2017 · 6 comments

Comments

@gsnewmark
Copy link

gsnewmark commented Nov 9, 2017

Right now default manifold.time scheduler is based on single-thread executor which is used both internally in Manifold itself (e. g., for deferred timeouts) and in public scheduling API (namely manifold.time/in & manifold.time/every). This can lead to nasty deadlock bugs when seemingly unconnected parts of application stop working due to blocking in scheduler. For example, following code will never finish because inner timeout will wait for scheduler thread blocked by the scheduled function itself:

user=> (require '[manifold.deferred :as d])
nil
user=> (require '[manifold.time :as t])
nil
user=> @(t/in 1000 (fn [] @(d/timeout (d/deferred) 100 ::inner-timeout)))

This example is artificial, but we actually had similar situation in real code recently (just with more indirection involved).

manifold.time allows passing of custom clocks/executors to scheduling function (with-clock), so that's not really a major issue. But still should documentation be updated to clearly reflect that using manifold.time in application code as is can cause issues with other parts of Manifold? Alternatively, should two separate clocks be created by default - one for Manifold's own scheduling needs and one for all other manifold.time users (on the first glance I can't see problems with this approach, but maybe I'm missing something)?

@joeltio
Copy link

joeltio commented Aug 12, 2018

I had the same problem with s/try-take!:

(t/every 2000 #(do (println "hello")
                   (s/try-take! (s/stream) 1000)))

The above code will print hello once, then hang.

As in @gsnewmark's example, it seems that s/try-take! also uses d/timeout.

@kachayev
Copy link
Collaborator

@ztellman Initially this code used fixed pool executor resized to the num of cores. num-cores variable is still there, even tho' it's not used right now. What was the reason for switching to a single thread executor? I know that having more threads would not eliminate the problem described above, I'm just curious.

@kachayev
Copy link
Collaborator

kachayev commented Nov 27, 2018

@joeltio I've tried a few different versions of your code but it seems I cannot reproduce the issue with it. It works fine from the first glance.

@gsnewmark Regarding your example. I know where this issue comes from and I do understand how hard to debug something like that in a large codebase... but semantically, if you block a thread from whatever seems to be an "async" context (whatever that means technically), you always have problems. That's a curse of implementing async flows on top of JVM runtime, you should always keep in mind that there's a thread pool somewhere underneath the API and this thread pool might end up in the situation where all threads are blocked. core.async is a great example of how leaky this abstraction and how fragile the idea to write the program w/o understanding of the underlying threads mechanics. I think it makes sense to mention in the documentation that default scheduling thread pool should be used carefully.

With that being said I think there's no ideal solution here, what we can do:

  1. Mention in the documentation that blocking operations inside chain/catch/in/every etc are very risky and you should think twice what thread pool backs each callback. Not specifically for manifold.time but for all callbacks.

  2. Introduce a new API that doesn't require time/* callbacks to run on a scheduling thread. Just to make sure that schedule thread is doing only one thing: deliver values at some point in time. And the actual execution of callbacks (either provided or added as listeners using chain) happens somewhere else. I.e. time/every-on that requires an executor to be provided, an option to create a deferred that will be realized to a specific value after delay so you can use it with alt (think core.async/timeout channels).

@ztellman what do you think?

@joeltio
Copy link

joeltio commented Nov 28, 2018

@kachayev My bad, I just tried it and it didn't work too. I can't remember what I was trying to do, but I know that after running the above code, it will print hello only once, though the program will not hang.

The following code without the s/try-take! will print hello 6 times:

(t/every 2000 #(println "hello"))
(Thread/sleep 10000)

However, the following code will print hello once:

(t/every 2000 #(do (println "hello")
                   @(s/try-take! (s/stream) 1000)))
(Thread/sleep 10000)

To make it hang, you can swap out t/every with t/in:

@(t/in 2000 #(do (println "hello")
                 @(s/try-take! (s/stream) 1000)))

The code will print hello once, then hang.

@kachayev
Copy link
Collaborator

@joeltio

My bad, I just tried it and it didn't work too.

No problem!

                 @(s/try-take! (s/stream) 1000)))

That's exactly the same problem @gsnewmark described earlier (thread blocking that leads to the appropriate thread pool to be exhausted). I'm still thinking about what could be done here to improve the situation except for a better documentation.

@gsnewmark
Copy link
Author

I'm perfectly fine with updating the documentation (as was indicated in the initial report), because right now the most glaring issue is that this behavior is not immediately obvious unless you check the internals of the Manifold. And when you know the reason it's quite easy to circumvent the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants