-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel execution serialization #11
Comments
When collecting performance counters, the profiler will introduce serialization to try to ensure that only one kernel is executing at a time. There is no option for this, as it is the default behavior. |
What about measuring performance in real-life environment under concurrent execution? Additionally this seems to imply that traces in CodeXL can't be used to analyze kernel overlap? |
Serialization is only done when collecting performance counters (which is the mode you would use to analyze performance of individual kernels). No additional serialization is introduced when collecting a trace (which is the mode you would use to analyze an entire application (including kernel overlap)). |
I see. I'd suggest allowing serialization to be turned on/off. Is there a way to measure wall-time only without serialization? |
Is there an option for making all the kernels execute sequentially (especially when work is launched in multiple queues)? Coming from CUDA and nvprof, I was surprised to not find such a feature for the better kernel performance understanding.
The text was updated successfully, but these errors were encountered: