Unify most processor kinds #1747

muraj · 2024-08-28T18:04:50Z

All of our processors have different "kinds" that segregate their capabilities and features. For example, often times we want to associate a GPU with a python processor, and leverage all stream management within a python task. Another example is when clients want to configure the available processors based on the machine topology on behalf of the user. This has been partially implemented via the configuration API, but is based on the command line argument interface in which Realm still manages the construction of the processors and their affinities, which is not a rich enough interface to properly describe what is needed.

Instead, we'd like to propose an interface where applications can dynamically create processors with certain properties and features enabled. Naming and actual syntactical language subject to change, the new interface for creating processors would look something like the following:

using namespace Realm;
int main() {
  Runtime r;
  r.init();
  // r.get_available_nodes(local=true);
  // r.get_nodeid();
  r.get_available_core_layout(&core_layout, nodeid);  // TBD
  cuda_mod = r.get_module_specific<CudaModule>();
  cuda_mod->get_available_gpus(&gpus, &num_gpus);
  for (size_t g = 0; g < num_gpus; g++) {
   cuda_mod->get_gpu_info(gpus[g], &gpu_info);
  }
  // Process the gpu and core information to e.g. find core(s) closest to the
  // gpu to use for the processor and fill up create_processor_info structure with the needed information.

  if (r.get_module_specific<PythonModule>() != nullptr) {
    create_processor_info.python = true;
  }

  bool ok = r.create_processor(&gpu_proc, &create_processor_info);

  r.refresh_machine_model(); // Distributes all the newly created processors and their
                             // affinities to all the ranks, allowing remote queries to work
                             // Possibly return an event here to wait on?
  
  return 0;
}

In order to maintain compatibility with the interface we already have, these "custom" processors will probably have a new "USER_KIND" or something, and a new set of queries to reverse engineer the processor for applications can be provided, e.g.:

if (p.kind() == PROC_USER_KIND) {
  p.get_feature_flags(&features);
  if (features.enables_cuda) { // Naming TBD
    cuda_mod->get_cuda_info(p, &cuda_info);
    // Contains associated gpu, context, etc
  }
  if (features.enables_python) {
    py_mod->get_python_info(p, &py_info);
    // Maybe retrieve the python interpreter object, etc.
  }
}

The first step in this is to internally remove all the derived classes of LocalTaskProcessor and utilize the ContextManager for when tasks are about to be start / finish executing and push most of the logic of how to create these processors out and into the caller instead of a derived object. This will allow us to componentize our current processors and verify the logic for creating these processors dynamically will work with our current test suite.

lightsighter · 2024-08-29T07:50:23Z

Can you provide a prototype for what the create_processor_info struct will look like?

Also, I think we should show some code of what machine model queries will look like with the new interface.

This is also a duplicate of #680

muraj · 2024-08-29T19:59:58Z

@lightsighter I don't have a complete story of the create_processor_info structure as of yet, but here's what I was thinking, it's very reminiscent of DirectX and Vulkan. Keep in mind that we can build whatever C++ wrappers we want on top of this, but I'm open to comments / suggestions:

namespace Realm {
struct CreateProcessorInfo {
  ProcessorInfoType type = CREATE_PROCESSOR_INFO;  // To help with versioning
  void *pNext = nullptr;
  size_t *coreids = nullptr;
  size_t num_cores = 0;
}; }

namespace Realm::Cuda {
struct CreateCudaProcessorInfo {
  ProcessorInfoType type = CREATE_CUDA_PROCESSOR_INFO;  // To help with versioning
  void *pNext = nullptr;
  CUuuid gpuid; // maybe some more fields here.
}; }

namespace Realm::Python {
struct CreatePythonProcessorInfo {
  ProcessorInfoType = CREATE_PYTHON_PROCESSOR_INFO;
  void *pNext = nullptr;
  // Python specific processor stuffs
}; }

// e.g.
CreateProcessorInfo create_processor_info;
CreateCudaProcessorInfo cuda_processor_info;
CreatePythonProcessorInfo python_processor_info;

std::vector<size_t> allcores;
size_t num_cores = 0;
// all_cores, numa_cores, etc.
r.get_all_cores(nullptr, &num_cores);
allcores.resize(num_cores);
r.get_all_cores(allcores.data(), &num_cores);

create_processor_info.pNext = &cuda_processor_info;
create_processor_info.coreids = allcores.data();
create_processor_info.num_cores = allcores.size();

cuda_processor_info.gpu = gpu_infos.front().uuid;
cuda_processor_info.pNext = &python_processor_info;

Processor p;
err = r.create_processor(&p, &create_processor_info);

muraj · 2024-08-29T20:14:50Z

Thinking about it, here's the C++ wrapper we can make on top of this:

Processor p = ProcessorBuilder()
                .set_cores(all_cores)
                .set_gpu(gpu_infos.front().uuid);

This is fairly easily built as a header-only class.

muraj · 2024-08-29T20:20:10Z

Also, I think we should show some code of what machine model queries will look like with the new interface.

For this, I think doing a simple extension of the ProcessorQuery like the following would be enough:

ProcessorQuery::Features features;
features.has_cuda = true;
ProcessorQuery pq = ProcessorQuery().has_features(features);

This would work with all processor kinds, so the original TOC_PROC would be returned in this query as well.

muraj added enhancement Realm Issues pertaining to Realm best effort indicates the milestone tag for an issue is a goal rather than a commitment labels Aug 28, 2024

muraj self-assigned this Aug 28, 2024

muraj added this to the realm-25.02 milestone Sep 16, 2024

elliottslaughter mentioned this issue Oct 23, 2024

Pygion: import torch gives 'duplicate registration of function' #1501

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify most processor kinds #1747

Unify most processor kinds #1747

muraj commented Aug 28, 2024 •

edited

Loading

lightsighter commented Aug 29, 2024

muraj commented Aug 29, 2024 •

edited

Loading

muraj commented Aug 29, 2024

muraj commented Aug 29, 2024 •

edited

Loading

Unify most processor kinds #1747

Unify most processor kinds #1747

Comments

muraj commented Aug 28, 2024 • edited Loading

lightsighter commented Aug 29, 2024

muraj commented Aug 29, 2024 • edited Loading

muraj commented Aug 29, 2024

muraj commented Aug 29, 2024 • edited Loading

muraj commented Aug 28, 2024 •

edited

Loading

muraj commented Aug 29, 2024 •

edited

Loading

muraj commented Aug 29, 2024 •

edited

Loading