-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/MLX5: fix fences in a2a's WQEs #1069
base: master
Are you sure you want to change the base?
TL/MLX5: fix fences in a2a's WQEs #1069
Conversation
Regarding:
Do we have a single UMR per AllToAll operation? If the answer is yes - then I agree, no need for any fence (assuming we also have a barrier and the UMR does not change/modify any Mkey that maps memory that might be accessed by previous operations). |
We have two UMRs per AllToAll, on for the send and one for the recv key. Indeed the src and recv buffers are assumed to be ready when the collective is called, and UMRs are the first WQEs |
struct mlx5dv_qp_ex * mqp = mlx5dv_qp_ex_from_ibv_qp_ex(qp_ex); | ||
struct mlx5_wqe_ctrl_seg * ctrl; | ||
struct mlx5_wqe_umr_ctrl_seg * umr_ctrl_seg; | ||
uint8_t fm_ce_se = MLX5_WQE_CTRL_CQ_UPDATE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It hangs because if you don't set CE to be 0x2 (as specified by MLX5_WQE_CTRL_CQ_UPDATE) - you won't get a CQE for this WQE you're posting...
Sorry my previous PR description and analysis was misleading. I edited the description and changed further the flags, please see the last commit. |
I'm not sure why:
Since IIUC it can also be posted on an "RDMA QP" but I don't think it matters. Another note: |
Regarding
IIRC, InfiniBand's ordering semantics guarantee that Atomic operations are executed in order (according to the message/PSN ordering) on the responder side. So, no need to indicate "fence" in atomic operation on the requestor side. @samnordmann, please double check this. |
[Edited]
What
Fix fences in WQEs.
QPs used by each node leader:
Here the different WQEs in the algorithm:
Conclusion regarding flags:
which consumes UMR, needs a small fence.doesn't need any fence.