-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion `worker->inprogress++ == 0' failed #10039
Comments
Seems an issue with enabling multi-threading support. If the application is multi-threaded, UCX has to be compiled with multi-thread support (--enable-mt) and ucp_worker_create has to be called with |
I am using jucx, the Java binding, how do I have to call it in that case?
…On Mon, Aug 5, 2024 at 9:17 AM Yossi Itigin ***@***.***> wrote:
Seems an issue with enabling multi-threading support. If the application
is multi-threaded, UCX has to be compiled with multi-thread support
(--enable-mt) and ucp_worker_create has to be called with ucp_worker_params_t::thread_mode=
UCS_THREAD_MODE_MULTI
—
Reply to this email directly, view it on GitHub
<#10039 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALGUSLZ3367LSDPKVXWQ4HLZP6QTPAVCNFSM6AAAAABL3ZQNPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGQ2DGMJSGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
See https://github.com/openucx/ucx/blob/master/bindings/java/src/test/java/org/openucx/jucx/UcpWorkerTest.java#L41 - requestThreadSafety |
Describe the bug
I have compiled the code in my laptop and there it executes perfectly, however when I port the code to a server I am sometimes running into this error, however this does not happen always. I am not sure when this error arises.
[gs07r1b29:3935050:2:3935480] ucp_worker.c:2990 Assertion `worker->inprogress++ == 0' failed
backtrace (tid:3935480) ====
0 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_handle_error+0x3f4) [0x7f5584b05704]
1 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_fatal_error_message+0xec) [0x7f5584b02b9c]
2 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_fatal_error_format+0x103) [0x7f5584b02aa3]
3 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucp.so.0.0.0(ucp_worker_progress+0x1a3) [0x7f5552cfb433]
4 [0x7f5515415e5b]
[gs07r1b29:3935050:1:3935478] ucp_worker.c:2995 Assertion `--worker->inprogress == 0' failed
backtrace (tid:3935478) ====
0 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_handle_error+0x3f4) [0x7f5584b05704]
1 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_fatal_error_message+0xec) [0x7f5584b02b9c]
2 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucs.so.0.0.0(ucs_fatal_error_format+0x103) [0x7f5584b02aa3]
3 /gpfs/apps/MN5/GPP/UCX/1.16.0/INTEL/lib/libucp.so.0.0.0(ucp_worker_progress+0xd3) [0x7f5552cfb363]
4 [0x7f5515415e5b]
Steps to Reproduce
Executing an application that involves send stream / receive stream using jucx, and follows a structure similar to the UCXBenchmark
Setup and versions
Using
export UCX_TLS=ud_mlx5
export UCX_NET_DEVICES=mlx5_2:1
The text was updated successfully, but these errors were encountered: