-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify most processor kinds #1747
Comments
Can you provide a prototype for what the Also, I think we should show some code of what machine model queries will look like with the new interface. This is also a duplicate of #680 |
@lightsighter I don't have a complete story of the create_processor_info structure as of yet, but here's what I was thinking, it's very reminiscent of DirectX and Vulkan. Keep in mind that we can build whatever C++ wrappers we want on top of this, but I'm open to comments / suggestions: namespace Realm {
struct CreateProcessorInfo {
ProcessorInfoType type = CREATE_PROCESSOR_INFO; // To help with versioning
void *pNext = nullptr;
size_t *coreids = nullptr;
size_t num_cores = 0;
}; }
namespace Realm::Cuda {
struct CreateCudaProcessorInfo {
ProcessorInfoType type = CREATE_CUDA_PROCESSOR_INFO; // To help with versioning
void *pNext = nullptr;
CUuuid gpuid; // maybe some more fields here.
}; }
namespace Realm::Python {
struct CreatePythonProcessorInfo {
ProcessorInfoType = CREATE_PYTHON_PROCESSOR_INFO;
void *pNext = nullptr;
// Python specific processor stuffs
}; }
// e.g.
CreateProcessorInfo create_processor_info;
CreateCudaProcessorInfo cuda_processor_info;
CreatePythonProcessorInfo python_processor_info;
std::vector<size_t> allcores;
size_t num_cores = 0;
// all_cores, numa_cores, etc.
r.get_all_cores(nullptr, &num_cores);
allcores.resize(num_cores);
r.get_all_cores(allcores.data(), &num_cores);
create_processor_info.pNext = &cuda_processor_info;
create_processor_info.coreids = allcores.data();
create_processor_info.num_cores = allcores.size();
cuda_processor_info.gpu = gpu_infos.front().uuid;
cuda_processor_info.pNext = &python_processor_info;
Processor p;
err = r.create_processor(&p, &create_processor_info); |
Thinking about it, here's the C++ wrapper we can make on top of this: Processor p = ProcessorBuilder()
.set_cores(all_cores)
.set_gpu(gpu_infos.front().uuid); This is fairly easily built as a header-only class. |
For this, I think doing a simple extension of the ProcessorQuery like the following would be enough: ProcessorQuery::Features features;
features.has_cuda = true;
ProcessorQuery pq = ProcessorQuery().has_features(features); This would work with all processor kinds, so the original TOC_PROC would be returned in this query as well. |
All of our processors have different "kinds" that segregate their capabilities and features. For example, often times we want to associate a GPU with a python processor, and leverage all stream management within a python task. Another example is when clients want to configure the available processors based on the machine topology on behalf of the user. This has been partially implemented via the configuration API, but is based on the command line argument interface in which Realm still manages the construction of the processors and their affinities, which is not a rich enough interface to properly describe what is needed.
Instead, we'd like to propose an interface where applications can dynamically create processors with certain properties and features enabled. Naming and actual syntactical language subject to change, the new interface for creating processors would look something like the following:
In order to maintain compatibility with the interface we already have, these "custom" processors will probably have a new "USER_KIND" or something, and a new set of queries to reverse engineer the processor for applications can be provided, e.g.:
The first step in this is to internally remove all the derived classes of LocalTaskProcessor and utilize the ContextManager for when tasks are about to be start / finish executing and push most of the logic of how to create these processors out and into the caller instead of a derived object. This will allow us to componentize our current processors and verify the logic for creating these processors dynamically will work with our current test suite.
The text was updated successfully, but these errors were encountered: