Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia freezes when only one thread throws an error in an @batch block #31

Open
efaulhaber opened this issue Jun 29, 2021 · 5 comments
Open

Comments

@efaulhaber
Copy link

This one was a real Heisenbug. Unstable simulations, which crashed in serial execution, froze on multiple threads, but trying to let it freeze on purpose made the bug disappear.

If only one thread (except the first one!) throws an error, Julia freezes.
MWE:

using Polyester

function foo()
    println("Before")

    @batch for i in 1:100
        if Threads.threadid() == 2
            error()
        end
    end

    println("After")
end

Running this on more than one thread causes Julia to freeze after showing Before. This can be easily interrupted with Ctrl+C, which then seems to solve the problem for the session, but this is only due to #30, because now only one thread will be used.

@chriselrod
Copy link
Member

By crashes in serial execution, do you mean throws an error?
Normally I think of crashes as "julia exits".

The fix here is probably to have ThreadingUtilities.wait check if the task it waited on threw an error.

@chriselrod
Copy link
Member

I think the API will be that wait prints any errors, resets the tasks automatically, and then returns an error code.
The caller can then decide what to do, e.g. whether to throw. In Polyester's case, it would throw after setting its own state appropriately.

This wouldn't solve the problem of someone interrupting the process manually, as then Polyester wouldn't get the chance to reset its own state. But it would stop the hangs, and allow multiple threads to still be used.

@ranocha
Copy link
Member

ranocha commented Jun 30, 2021

By crashes in serial execution, do you mean throws an error?

Yes - like throwing a DomainError from sqrt or log.

@OndrejKincl
Copy link

I observed a similar problem, but where the thread quietly stops working instead of throwing. For example, when I run this

using Polyester

const N = 10

function bar()
    A = zeros(N)
    @batch for i in 1:N
        A[i] = sqrt(i-2)
    end
    for i in 1:N
        println("sqrt($(i-2)) = $(A[i])")
    end
end

on two threads, I obtain the following result:

sqrt(-1) = 0.0
sqrt(0) = 0.0
sqrt(1) = 0.0
sqrt(2) = 0.0
sqrt(3) = 0.0
sqrt(4) = 2.0
sqrt(5) = 2.23606797749979
sqrt(6) = 2.449489742783178
sqrt(7) = 2.6457513110645907
sqrt(8) = 2.8284271247461903

@efaulhaber
Copy link
Author

This is #30. You can use Polyester.reset_threads!() to enable the threads again without restarting Julia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants