Skip to content

Commit

Permalink
core: add environment variable for selective HMEM initialization
Browse files Browse the repository at this point in the history
In order to provide selection mechanism for HMEM interfaces, a new
environment variable has been created -- FI_HMEM.

The nature of FI_HMEM is analogous to FI_PROVIDER, but intended for use
a selection mechanism for HMEM interfaces and not libfabric providers.

If not specified, HMEM interface initialize proceeds as it would prior
to this commit -- initializing any HMEM interface that was compiled into
the libfabric library.

If specified, FI_HMEM limits the initialization of any HMEM provider
that does not exist with the string, except FI_HMEM_SYSTEM which is
always loaded.

This feature allows libfabric customers using a deployment with multiple
HMEM interfaces to avoid the costs associated with the dlopen on system
where only a subset -- if any -- of the HMEM interfaces are actually
available.

The benefit of this change is most visible on heterogenous system with a
mix of GPU accelerators and the libraries that support them, or on
systems with no GPU accelerators but the customer is using a
vendor-provided build of libfabric that has support for a variety of
HMEM interfaces. Today, Libfabric will attempt to initialize any HMEM
monitor that can be initialized. With this change, only selected HMEM
interfaces will be initialized even if the library supports more than
the customer requested.

Signed-off-by: James Swaro <[email protected]>
  • Loading branch information
jswaro committed Oct 26, 2023
1 parent 11365a5 commit 0f7a22f
Showing 1 changed file with 47 additions and 0 deletions.
47 changes: 47 additions & 0 deletions src/hmem.c
Original file line number Diff line number Diff line change
Expand Up @@ -528,12 +528,59 @@ bool ofi_hmem_is_initialized(enum fi_hmem_iface iface)
return hmem_ops[iface].initialized;
}

void ofi_hmem_set_iface_filter(const char* iface_filter_str, bool* filter)
{
int iface, rlen, llen;
char* entry = NULL;
const char* token = ";";
const char* iface_labels[ARRAY_SIZE(hmem_ops)] = {
"system", // FI_HMEM_SYSTEM
"cuda", // FI_HMEM_CUDA
"rocr", // FI_HMEM_ROCR
"ze", // FI_HMEM_ZE
"neuron", // FI_HMEM_NEURON
"synapseai" // FI_HMEM_SYNAPSEAI
};

memset(filter, false, sizeof(bool) * ARRAY_SIZE(hmem_ops));

/* always enable system hmem interface */
filter[FI_HMEM_SYSTEM] = true;

entry = strtok(iface_filter_str, token);
while (entry != NULL) {
for (iface = 0; iface < ARRAY_SIZE(hmem_ops); iface++) {
if (!strcasecmp(iface_labels[iface], entry)) {
filter[iface] = true;
break;
}
}

entry = strtok(NULL, token);
}
}

void ofi_hmem_init(void)
{
int iface, ret;
int disable_p2p = 0;
char* hmem_filter = NULL;
bool filter_hmem_ifaces = false;
bool iface_filter_array[ARRAY_SIZE(hmem_ops)];

fi_param_define(NULL, "hmem", FI_PARAM_STRING,
"List of hmem interfaces to attempt to initialize (default: all available interfaces)");
fi_param_get_str(NULL, "hmem", &hmem_filter);

if (hmem_filter && strlen(hmem_filter) != 0) {
ofi_hmem_set_iface_filter(hmem_filter, &iface_filter_array);
filter_hmem_ifaces = true;
}

for (iface = 0; iface < ARRAY_SIZE(hmem_ops); iface++) {
if (filter_hmem_ifaces && !iface_filter_array[iface])
continue;

ret = hmem_ops[iface].init();
if (ret != FI_SUCCESS) {
if (ret == -FI_ENOSYS)
Expand Down

0 comments on commit 0f7a22f

Please sign in to comment.