Realm Support for GPU Kernel Profiling #1645

lightsighter · 2024-03-07T06:28:46Z

Today Realm provides a profiling measurement for bounding the activity of all asynchronous work that a GPU task performs on a GPU.

https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/realm/profiling.h?ref_type=heads#L127-147

While this can be useful for putting bounds on how long work took on the GPU, it's not actually precise. The reason for this is that a GPU task might launch multiple kernels or other asynchronous operations (e.g. memcpy) from the task and they might be interleaved by the GPU driver with kernels from other GPU tasks, meaning the GPU isn't busy running just kernels from one GPU task at a time. It would be good if Realm could provide profiling feedback about individual kernels and other asynchronous operations that were performed inside of a GPU task and when they actually ran on the GPU so that we can accurately represent that information to mappers and to the Legion profiler. It seems like this might be possible to do with the CUPTI interface in CUDA, but it's unclear what kinds of overheads it might incur. It also seems to be a global setting so you might have to pay for it all the time even if many GPU tasks don't actually request the specific kind of kernel profiling measurement from Realm. Some exploration should be done to determine if this is even a reasonable path before actually embarking on it.

Assigning @apryakhin to triage for now. This is a low-priority enhancement.

muraj · 2024-08-05T23:14:28Z

Closing this issue as a duplicate of #1732 which has more traction.

elliottslaughter · 2024-08-05T23:23:18Z

I'm not sure that #1732 actually supersedes this issue?

My understanding is that #1732 is about making the bounding box around the GPU kernels of a task tighter. But you still fundamentally get one box.

This issue is about accurately representing multiple boxes, one per kernel. Obviously this is of no use if it's not at least as precise as #1732, but fundamentally it's a different problem to solve. And (I suspect) an open question whether we want solve it, because it could potentially have dramatically higher overheads.

lightsighter · 2024-08-06T08:44:22Z

I have mixed feelings. One one hand I don't think Realm should be in the business of duplicating the functionality of Nsight, but at the same time, I think there might be some value in at least getting individual kernel profilings in the Legion Prof profile. I'd be inclined to reopen this issue and just let it remain open for a while in case any important use cases pop up.

lightsighter added enhancement Realm Issues pertaining to Realm labels Mar 7, 2024

lightsighter mentioned this issue Mar 15, 2024

legion prof: GPU task times not rendered accurately #755

Closed

muraj added the duplicate label Aug 5, 2024

muraj closed this as completed Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Realm Support for GPU Kernel Profiling #1645

Realm Support for GPU Kernel Profiling #1645

lightsighter commented Mar 7, 2024

muraj commented Aug 5, 2024

elliottslaughter commented Aug 5, 2024

lightsighter commented Aug 6, 2024

Realm Support for GPU Kernel Profiling #1645

Realm Support for GPU Kernel Profiling #1645

Comments

lightsighter commented Mar 7, 2024

muraj commented Aug 5, 2024

elliottslaughter commented Aug 5, 2024

lightsighter commented Aug 6, 2024