Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify most processor kinds #1747

Open
muraj opened this issue Aug 28, 2024 · 4 comments
Open

Unify most processor kinds #1747

muraj opened this issue Aug 28, 2024 · 4 comments
Assignees
Labels
best effort indicates the milestone tag for an issue is a goal rather than a commitment enhancement Realm Issues pertaining to Realm
Milestone

Comments

@muraj
Copy link

muraj commented Aug 28, 2024

All of our processors have different "kinds" that segregate their capabilities and features. For example, often times we want to associate a GPU with a python processor, and leverage all stream management within a python task. Another example is when clients want to configure the available processors based on the machine topology on behalf of the user. This has been partially implemented via the configuration API, but is based on the command line argument interface in which Realm still manages the construction of the processors and their affinities, which is not a rich enough interface to properly describe what is needed.

Instead, we'd like to propose an interface where applications can dynamically create processors with certain properties and features enabled. Naming and actual syntactical language subject to change, the new interface for creating processors would look something like the following:

using namespace Realm;
int main() {
  Runtime r;
  r.init();
  // r.get_available_nodes(local=true);
  // r.get_nodeid();
  r.get_available_core_layout(&core_layout, nodeid);  // TBD
  cuda_mod = r.get_module_specific<CudaModule>();
  cuda_mod->get_available_gpus(&gpus, &num_gpus);
  for (size_t g = 0; g < num_gpus; g++) {
   cuda_mod->get_gpu_info(gpus[g], &gpu_info);
  }
  // Process the gpu and core information to e.g. find core(s) closest to the
  // gpu to use for the processor and fill up create_processor_info structure with the needed information.

  if (r.get_module_specific<PythonModule>() != nullptr) {
    create_processor_info.python = true;
  }

  bool ok = r.create_processor(&gpu_proc, &create_processor_info);

  r.refresh_machine_model(); // Distributes all the newly created processors and their
                             // affinities to all the ranks, allowing remote queries to work
                             // Possibly return an event here to wait on?
  
  return 0;
}

In order to maintain compatibility with the interface we already have, these "custom" processors will probably have a new "USER_KIND" or something, and a new set of queries to reverse engineer the processor for applications can be provided, e.g.:

if (p.kind() == PROC_USER_KIND) {
  p.get_feature_flags(&features);
  if (features.enables_cuda) { // Naming TBD
    cuda_mod->get_cuda_info(p, &cuda_info);
    // Contains associated gpu, context, etc
  }
  if (features.enables_python) {
    py_mod->get_python_info(p, &py_info);
    // Maybe retrieve the python interpreter object, etc.
  }
}

The first step in this is to internally remove all the derived classes of LocalTaskProcessor and utilize the ContextManager for when tasks are about to be start / finish executing and push most of the logic of how to create these processors out and into the caller instead of a derived object. This will allow us to componentize our current processors and verify the logic for creating these processors dynamically will work with our current test suite.

@muraj muraj added enhancement Realm Issues pertaining to Realm best effort indicates the milestone tag for an issue is a goal rather than a commitment labels Aug 28, 2024
@muraj muraj self-assigned this Aug 28, 2024
@lightsighter
Copy link
Contributor

Can you provide a prototype for what the create_processor_info struct will look like?

Also, I think we should show some code of what machine model queries will look like with the new interface.

This is also a duplicate of #680

@muraj
Copy link
Author

muraj commented Aug 29, 2024

@lightsighter I don't have a complete story of the create_processor_info structure as of yet, but here's what I was thinking, it's very reminiscent of DirectX and Vulkan. Keep in mind that we can build whatever C++ wrappers we want on top of this, but I'm open to comments / suggestions:

namespace Realm {
struct CreateProcessorInfo {
  ProcessorInfoType type = CREATE_PROCESSOR_INFO;  // To help with versioning
  void *pNext = nullptr;
  size_t *coreids = nullptr;
  size_t num_cores = 0;
}; }

namespace Realm::Cuda {
struct CreateCudaProcessorInfo {
  ProcessorInfoType type = CREATE_CUDA_PROCESSOR_INFO;  // To help with versioning
  void *pNext = nullptr;
  CUuuid gpuid; // maybe some more fields here.
}; }

namespace Realm::Python {
struct CreatePythonProcessorInfo {
  ProcessorInfoType = CREATE_PYTHON_PROCESSOR_INFO;
  void *pNext = nullptr;
  // Python specific processor stuffs
}; }

// e.g.
CreateProcessorInfo create_processor_info;
CreateCudaProcessorInfo cuda_processor_info;
CreatePythonProcessorInfo python_processor_info;

std::vector<size_t> allcores;
size_t num_cores = 0;
// all_cores, numa_cores, etc.
r.get_all_cores(nullptr, &num_cores);
allcores.resize(num_cores);
r.get_all_cores(allcores.data(), &num_cores);

create_processor_info.pNext = &cuda_processor_info;
create_processor_info.coreids = allcores.data();
create_processor_info.num_cores = allcores.size();

cuda_processor_info.gpu = gpu_infos.front().uuid;
cuda_processor_info.pNext = &python_processor_info;

Processor p;
err = r.create_processor(&p, &create_processor_info);

@muraj
Copy link
Author

muraj commented Aug 29, 2024

Thinking about it, here's the C++ wrapper we can make on top of this:

Processor p = ProcessorBuilder()
                .set_cores(all_cores)
                .set_gpu(gpu_infos.front().uuid);

This is fairly easily built as a header-only class.

@muraj
Copy link
Author

muraj commented Aug 29, 2024

Also, I think we should show some code of what machine model queries will look like with the new interface.

For this, I think doing a simple extension of the ProcessorQuery like the following would be enough:

ProcessorQuery::Features features;
features.has_cuda = true;
ProcessorQuery pq = ProcessorQuery().has_features(features);

This would work with all processor kinds, so the original TOC_PROC would be returned in this query as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
best effort indicates the milestone tag for an issue is a goal rather than a commitment enhancement Realm Issues pertaining to Realm
Projects
None yet
Development

No branches or pull requests

2 participants