-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMRCallback
non-deterministically slow with multiple threads on macOS ARM
#1463
Comments
A workaround would be to identify the non-threaded loops that are slow and add ThreadingUtilities.sleep_all_tasks() before them. A better option would obviously be to find an upstream fix for JuliaSIMD/Polyester.jl#89. |
Do you know why it only seems to affect the |
I haven't tested a lot, I just noticed that while making #1462. Maybe other parts of the code are affected as well. But the AMR callback is prone to this bug because that's where allocations happen. As I said, using |
So the best workaround seems to be something like ThreadingUtilities.sleep_all_tasks()
# serial, allocating stuff
GC.gc() |
I found that this affects pretty much all elixirs. Even the default example is lagging (hanging for about a second every few thousand time steps). Interestingly, I can't see this when I benchmark just the
Since JuliaSIMD/Polyester.jl#89 still doesn't have any other solution than macro threaded(expr)
return esc(quote
Trixi.@batch $(expr)
ThreadingUtilities.sleep_all_tasks()
end)
end This removed all lagging, and made my example go from 15s to 10s, but it makes each threaded loop take longer because the threads have to wake up first, resulting in a 2x slowdown of the
I guess then without this bug, the simulation would go down to about 5-6s, so this bug basically causes everything to be about 2x slower on macOS ARM. |
About 4 executions out of 100 are extremely slow (over 1s vs 250µs). This is consistent with what I see in the simulation. It runs for about 200 time steps, and then it freezes for a second or so. That causes AMR to take over 80% of the simulation time.
One thread:
Multiple threads:
This is most likely caused by JuliaSIMD/Polyester.jl#89.
Some non-allocating
@batch
loop causes allocating non-threaded loops to freeze for some reason.Note that I had to use #1462 for these benchmarks, since the indicators don't work on macOS ARM otherwise.
The text was updated successfully, but these errors were encountered: