-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cudadev][RFC] Prototype (host|device)_unique_ptr API to use lightweight "Context" object instead of CUDA stream #256
Conversation
Add runAcquire(), runProduce() functions
… in better-defined way
class HostAllocatorContext { | ||
public: | ||
explicit HostAllocatorContext(cudaStream_t stream) : stream_(stream) {} | ||
|
||
void *allocate_host(size_t nbytes) const { return cms::cuda::allocate_host(nbytes, stream_); } | ||
|
||
void free_host(void *ptr) const { cms::cuda::free_host(ptr); } | ||
|
||
private: | ||
cudaStream_t stream_; | ||
}; | ||
|
||
class DeviceAllocatorContext { | ||
public: | ||
explicit DeviceAllocatorContext(cudaStream_t stream) : stream_(stream) {} | ||
|
||
void *allocate_device(size_t nbytes) const { return cms::cuda::allocate_device(nbytes, stream_); } | ||
|
||
void free_device(void *ptr) const { cms::cuda::free_device(ptr, stream_); } | ||
|
||
private: | ||
cudaStream_t stream_; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now (and possibly forever in cudadev
) the HostAllocatorContext
and DeviceAllocatorContext
look nearly identical, but in the future (in CMSSW) they could hold a pointer to the CachingHostAllocator
/CachingDeviceAllocator
objects.
…uda::Context objects instead of cudaStream_t
f2054b5
to
c91e0e3
Compare
Made effectively obsolete by cms-sw/cmssw#39428 (although this particular development is not part of the CMSSW PR). |
This PR builds on top of #224, but because of the actual developments conflict between the base commit of #224 and
master
, also the #224 part is rebased. The actual developments of this PR are in the last three commits.The change can be summarized in
make_device_unique<T>(stream)
changing tomake_device_unique(ctx)
wherectx
can be e.g. theAcquireContext
/ProduceContext
, or a "lightweight"HostAllocatorContext
/DeviceAllocatorContext
/Context
(theAcquireContext
/ProduceContext
are convertible to the latterContext
objects). (I'm really overusing the "Context" term here, but haven't figured out better wording yet).The idea is that
HostAllocatorContext
provides the access to pinned host memory allocator (and only that)DeviceAllocatorContext
provides access to device memory allocator (and only that)Context
provides access to both pinned host and device memory allocators (via conversions to the two former types), and also whatever is needed to launch asynchronous kernels or memory transfers (in practice the CUDA stream)This change would allow e.g.
CachingDeviceAllocator
andCachingHostAllocator
objects from global variables to be owned (again) byCUDAService
(in CMSSW only), that would further enable (again) the caching allocator parameters be configured at run timeAcquireContext
/ProduceContext
for better performance (see discussion in [cudadev] Improve caching allocator performance #218)