You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While porting the CMS pixel reconstruction from native CUDA to Alpaka, it was noticed that the use of the alpaka::getWarpSizes(device) function incurs a noticeable overhead.
Is there a CUDA device with a warpSize not 32? I am almost in favor of hardcoding it ... Otherwise, we could just collect and cache the entire device properties (i.e. cudaDeviceProp), so we can also serve other values faster.
Partly solved by #2246. Never the less we should cache all over runtime constant device properties within the device, than there is no need to query the API multiple times.
While porting the CMS pixel reconstruction from native CUDA to Alpaka, it was noticed that the use of the
alpaka::getWarpSizes(device)
function incurs a noticeable overhead.See cms-sw/cmssw#43064 (comment) for the discussion.
A possible workaround is to cache the warp size in our code, instead of querying it for every event.
However, it would seem natural to cache this information within the Alpaka device objects, instead of querying the underlying back-end each time.
The text was updated successfully, but these errors were encountered: