-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add value initialisation to make_host_unique (and make_device_unique ?) #587
Comments
@makortel what do you think ? |
@jsalfeld this is something you brought up on Mattermost |
That was intentional because we wanted to allocate device memory as uninitialized, and we(/I?) wanted to enforce it compile time to minimize surprises, which essentially implied similar restrictions on the pinned-host allocations as well. cms-sw#31721 made also me think we probably could improve the interface. For pinned host allocations we could just do the value initialization in The implications for device memory would then be (for consistency)
I think the current requirement of What about |
Why On the other hand, |
I need more time to digest the rest, but I can comment on this:
If we initialise an array of N elements to a single value, it would be more efficient to copy the value to the GPU only once, and use it to set all elements. To use |
So
If I managed to understand correctly the difference between default initialisation and value initialisation/zero initialisation:
On our side
|
I don't disagree, the whole initialization business is rather convoluted.
I agree.
I agree. The history of the For pinned host allocations following |
Yes, I agree that writing I don't think I have good answers to the rest :-( |
We could also think of ditching the attempt to mimic Or maybe we could rename the creation function to something along |
cms::cuda::make_host_unique
allocates pinned host memory but leaves it uninitialised.In some cases it may be useful to initialise the memory to a specific value (or N copies of a value for the array version).
It should be simple to add overload that takes a value by copy and sets the newly allocated memory.
I'm not sure if it makes sense to do it also for
make_device_uniqe
?For a single value it could easily be done via
cudaMemsetAsync
orcudaMemcpyAsync
.For an array I don't know if there is a CUDA runtime function we can leverage.
The text was updated successfully, but these errors were encountered: