You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I try to do ALLREDUCE between two processes, using UCG without MPI.
I was able to create UCG context, UCP worker, and UCG group by exchanging their worker address.
However, creating ucg_coll_h with ucg_coll_allreduce_init causes a bunch of incast/bcast errors:
[...] select.c:517 UCX ERROR cannot add incast lane - reached limit (6)
[...] select.c:517 UCX ERROR cannot add bcast lane - reached limit (6)
[...] ucg_plan.c:388 UCX WARN No transports with native broadcast support were found, falling back to P2P transports (slower)
[...] ucg_plan.c:380 UCX WARN No transports with native incast support were found, falling back to P2P transports (slower)
free(): double free detected in tcache 2
Aborted (core dumped)
The attached file is a minimal working example for reproducing the problem. ucx_test.zip
# host1 and host2 are connected with 1G ethernet and 100G InfiniBand
$ unzip ucx_test.zip;cd ucx_test
$ make
# on host1
$ ./ucg_test 2 0 0 host1 12345 # meaning: total 2 process, this process's rank is 0, root's rank is 0 with address host1:12345# on host2
$ ./ucg_test 2 1 0 host1 12345 # meaning: total 2 process, this process's rank is 1, root's rank is 0 with address host1:12345
The text was updated successfully, but these errors were encountered:
I try to do ALLREDUCE between two processes, using UCG without MPI.
I was able to create UCG context, UCP worker, and UCG group by exchanging their worker address.
However, creating
ucg_coll_h
withucg_coll_allreduce_init
causes a bunch of incast/bcast errors:The attached file is a minimal working example for reproducing the problem.
ucx_test.zip
The text was updated successfully, but these errors were encountered: