Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realm Support for GPU Kernel Profiling #1645

Closed
lightsighter opened this issue Mar 7, 2024 · 3 comments
Closed

Realm Support for GPU Kernel Profiling #1645

lightsighter opened this issue Mar 7, 2024 · 3 comments
Labels
duplicate enhancement Realm Issues pertaining to Realm

Comments

@lightsighter
Copy link
Contributor

Today Realm provides a profiling measurement for bounding the activity of all asynchronous work that a GPU task performs on a GPU.

https://gitlab.com/StanfordLegion/legion/-/blob/master/runtime/realm/profiling.h?ref_type=heads#L127-147

While this can be useful for putting bounds on how long work took on the GPU, it's not actually precise. The reason for this is that a GPU task might launch multiple kernels or other asynchronous operations (e.g. memcpy) from the task and they might be interleaved by the GPU driver with kernels from other GPU tasks, meaning the GPU isn't busy running just kernels from one GPU task at a time. It would be good if Realm could provide profiling feedback about individual kernels and other asynchronous operations that were performed inside of a GPU task and when they actually ran on the GPU so that we can accurately represent that information to mappers and to the Legion profiler. It seems like this might be possible to do with the CUPTI interface in CUDA, but it's unclear what kinds of overheads it might incur. It also seems to be a global setting so you might have to pay for it all the time even if many GPU tasks don't actually request the specific kind of kernel profiling measurement from Realm. Some exploration should be done to determine if this is even a reasonable path before actually embarking on it.

Assigning @apryakhin to triage for now. This is a low-priority enhancement.

@muraj
Copy link

muraj commented Aug 5, 2024

Closing this issue as a duplicate of #1732 which has more traction.

@muraj muraj closed this as completed Aug 5, 2024
@elliottslaughter
Copy link
Contributor

I'm not sure that #1732 actually supersedes this issue?

My understanding is that #1732 is about making the bounding box around the GPU kernels of a task tighter. But you still fundamentally get one box.

This issue is about accurately representing multiple boxes, one per kernel. Obviously this is of no use if it's not at least as precise as #1732, but fundamentally it's a different problem to solve. And (I suspect) an open question whether we want solve it, because it could potentially have dramatically higher overheads.

@lightsighter
Copy link
Contributor Author

I have mixed feelings. One one hand I don't think Realm should be in the business of duplicating the functionality of Nsight, but at the same time, I think there might be some value in at least getting individual kernel profilings in the Legion Prof profile. I'd be inclined to reopen this issue and just let it remain open for a while in case any important use cases pop up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate enhancement Realm Issues pertaining to Realm
Projects
None yet
Development

No branches or pull requests

3 participants