Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OFI MR flags selection based on the provider #1077

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

wrrobin
Copy link
Collaborator

@wrrobin wrrobin commented Dec 1, 2022

For most of the OFI providers, SOS uses a specific MR mode selection. However, user still has the option to select a different mode that may not be supported. This is an attempt to select the MR mode based on the provider by default. User may still choose a different mode during configuration.

@stewartl318
Copy link
Collaborator

stewartl318 commented Dec 2, 2022 via email

@wrrobin
Copy link
Collaborator Author

wrrobin commented Dec 2, 2022

@stewartl318 - Great question. In OFI memory registration, there are certain flags that define the behavior of the operations. For example, OFI internally can choose memory region attributes v/s. the user can choose by themselves. These settings are controlled by different MR mode flags. Traditionally, there were two modes: basic and scalable (https://ofiwg.github.io/libfabric/v1.1.1/man/fi_mr.3.html). But, now, there are additional flags that you can use and each provider supports / requires a subset of them (https://github.com/ofiwg/libfabric/wiki/Provider-Feature-Matrix-Main).

@@ -194,13 +194,15 @@ AM_CONDITIONAL([HAVE_LONG_FORTRAN_HEADER], [test "$enable_long_fortran_header" =

AC_ARG_ENABLE([ofi-mr],
[AC_HELP_STRING([--enable-ofi-mr=MODE],
[OFI memory registration mode: basic, scalable, or rma-event (default: scalable)])])
[OFI memory registration mode: none, basic, scalable, or rma-event (default: none)])])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "none" be called "auto" since it selects MR flags based on the provider?

[basic],
[],
[AC_DEFINE([ENABLE_MR_BASIC], [1], [If defined, the OFI transport will use FI_MR_BASIC])],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ENABLE_MR_BASIC used anywhere? Is it needed?

ret = fi_mr_reg(shmem_transport_ofi_domainfd, 0, UINT64_MAX,
FI_REMOTE_READ | FI_REMOTE_WRITE, 0, 0ULL, flags,
&shmem_transport_ofi_target_heap_mrfd, NULL);
OFI_CHECK_RETURN_STR(ret, "target memory (all) registration failed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why virtual addressing now registers the heap now. Does it not need the data segment?

ret = fi_close(&shmem_transport_ofi_target_heap_mrfd->fid);
OFI_CHECK_ERROR_MSG(ret, "Target heap MR close failed (%s)\n", fi_strerror(errno));

#if defined(ENABLE_REMOTE_VIRTUAL_ADDRESSING)
if (shmem_transport_ofi_mr_mode != 0) {
ret = fi_close(&shmem_transport_ofi_target_data_mrfd->fid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a corresponding mr_reg for shmem_transport_ofi_target_data_mrfd if only ENABLE_REMOTE_VIRTUAL_ADDRESSING is enabled...

@davidozog davidozog added this to the 1.5.x milestone Feb 1, 2023
@@ -1157,20 +1158,10 @@ int query_for_fabric(struct fabric_info *info)
hints.addr_format = FI_FORMAT_UNSPEC;
domain_attr.data_progress = FI_PROGRESS_AUTO;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that #1059 possibly motivates a guard warning about and/or preventing the usage of FI_MR_ENDPOINT (and thus FI_PROGRESS_MANUAL) except for providers where it makes sense. But come to think of it, maybe this PR already accomplishes that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants