Kernel execution serialization #11

yupinov · 2018-04-23T10:53:16Z

Is there an option for making all the kernels execute sequentially (especially when work is launched in multiple queues)? Coming from CUDA and nvprof, I was surprised to not find such a feature for the better kernel performance understanding.

chesik-amd · 2018-04-23T12:55:37Z

When collecting performance counters, the profiler will introduce serialization to try to ensure that only one kernel is executing at a time. There is no option for this, as it is the default behavior.

pszi1ard · 2019-02-15T20:39:26Z

What about measuring performance in real-life environment under concurrent execution?

Additionally this seems to imply that traces in CodeXL can't be used to analyze kernel overlap?

chesik-amd · 2019-02-15T20:42:05Z

Serialization is only done when collecting performance counters (which is the mode you would use to analyze performance of individual kernels). No additional serialization is introduced when collecting a trace (which is the mode you would use to analyze an entire application (including kernel overlap)).

pszi1ard · 2019-02-15T22:17:11Z

I see. I'd suggest allowing serialization to be turned on/off.

Is there a way to measure wall-time only without serialization?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel execution serialization #11

Kernel execution serialization #11

yupinov commented Apr 23, 2018

chesik-amd commented Apr 23, 2018

pszi1ard commented Feb 15, 2019

chesik-amd commented Feb 15, 2019 •

edited

Loading

pszi1ard commented Feb 15, 2019

Kernel execution serialization #11

Kernel execution serialization #11

Comments

yupinov commented Apr 23, 2018

chesik-amd commented Apr 23, 2018

pszi1ard commented Feb 15, 2019

chesik-amd commented Feb 15, 2019 • edited Loading

pszi1ard commented Feb 15, 2019

chesik-amd commented Feb 15, 2019 •

edited

Loading