-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Physical CUDA stream management #279
Comments
With #100, we cache the CUDA streams with an We could get rid of one level of pointer indirection working with an Something along the lines of struct opaque;
typedef opaque * opaque_t;
void deleter(opaque_t msg) {
std::cout << (char *) msg << std::endl;
}
int main(void) {
char * data = new char[64];
strncpy(data, "Hello world", 64);
opaque_t ptr = reinterpret_cast<opaque *>(data);
auto shared = std::shared_ptr<opaque>(ptr, &deleter);
auto unique = std::unique_ptr<opaque, decltype((deleter))>(ptr, deleter);
return 0;
} |
Thanks @fwyzard, so it was just a matter of defining the custom deleters. Which actually make this approach not to work with On the other hand it looks like extending the |
Can we trust that (ok, likely NVIDIA will continue to like to hide the internals so "yes", but want to write the question out loud anyway) |
Some CUDA API functions (e.g. cudaStreamCreateWithFlags) mention On the other hand the special values
|
We should probably do the same trick with the cached CUDA events. |
Adding support for a custom deleter to On the other hand, changing reference + custom deleter
use cudaStream_t
However, I noticed that |
Thanks Andrea for the test. Do we have any other compelling arguments to go to "raw"
Yes, we could drop the explicit (redundant) device member. |
No, for me the performance was the only argument, and since the benchmark did not see any impact, we can keep using
Agreed. And possibly update the external - but the author changes the extension of the files from |
Well, I find myself going to read their implementation on the exact details every now and then. I do like some of the abstractions convenient (like Actually possibly my best argument for staying with
so in the destructor of the cache we could loop over the vector and set the device correctly. Relying on the destructors is more convenient though. So I have rather mixed feelings on the wrappers, but not strong enough to clearly say one or the other. |
A tiny argument against |
Currently (and in #100) we use CUDA stream class
cuda::stream_t<>
from the CUDA API wrappers. This issue is to discuss whether we should continue to do so, or switch to thecudaStream_t
(or do something else)Pros of
cuda::stream_t<>
cuda::stream<>::enqueue::callback()
gives very easy way to use lambdas as callbacksCons of
cuda::stream_t<>
cudaGetDevice()
/cudaSetDevice()
std::optional
), but later with the caching of streams inCUDAService
this point is no longer an issue (needsstd::shared_ptr
anyway)The text was updated successfully, but these errors were encountered: