-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate the use of CUDA managed memory #85
Comments
Just to write up one idea that came up in a discussion with @fwyzard and @felicepantaleo. It seems that the main(?) drawback from unified memory is that making device-to-host prefetches asyncronous in CPU is a bit complicated (from @fwyzard's [third link])(https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda/))
(and for that "deferred means"
) So one option would be a mixed approach of using unified memory to transfer data to GPUs (especially for conditions), and explicit memory for transferring data to CPU. |
#157 experiments with unified memory for conditions |
New version of templated code based on "trait structs"
@fwyzard re #267 (comment) (I started to write a reply but never finished, following up here)
You're referring to HMM, right? That essentially makes standard I wonder if |
From what I understand (see e.g. https://lwn.net/Articles/731259/ ) it is kind of the opposite: any memory area can be mapped from the host to the device; when the cpu later tries to access it, it triggers a page fault, and the memory is copied back to the host. So, my guess is that all memory returned by The next step would be to try it in practice... but I haven't been able to set up vinavx2 or an other machine with a recent enough kernel, and my laptop has a Maxwell card, while this requires Pascal or newer. And, as it will likely require CentOS 8 for use in production, it may be something we have to delay for a while. |
I'd still expect (in absence of better information) that the HMM internally talks to the NVidia driver, and that for the "HMM memory", the driver and the device have to do something similar to what is done for |
I'm planning to do a full-scale study with the pixeltrack-standalone, tracked in cms-patatrack/pixeltrack-standalone#43. |
Given the small time spent in memory transfer, and the possibility to optimise it via prefetching, it makes sense to investigate using CUDA managed memory.
To form a good idea of what it involves, one can go through these 2017 CUDA blog posts:
For further reading:
cudaMemAdvise()
cudaMemPrefetchAsync()
The text was updated successfully, but these errors were encountered: