-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use unified memory for conditions #157
base: CMSSW_10_4_X_Patatrack
Are you sure you want to change the base?
Use unified memory for conditions #157
Conversation
A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_2_X_Patatrack. It involves the following packages: CalibTracker/SiPixelESProducers The following packages do not have a category, yet: HeterogeneousCore/CUDACore @cmsbot, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
4 similar comments
ReferenceThroughput over 1000 events:
Top 10 contribution to GPU usage:
Pull request #157,Throughput over 1000 events:
|
While the contribution of the individual kernels does not seem to change, there seems to be an overall degradation of performance. I think it should be addressed - or at least understood - before merging. |
81cff35
to
438fbe3
Compare
1f15f2f
to
cb3e122
Compare
Rebased on top of CMSSW_10_4_0_pre4_Patatrack. Note that so far I've only compiled it, and didn't test running. |
…-sw#216) Port and optimise the full workflow from pixel raw data to pixel tracks and vertices to GPUs. Clean the pixel n-tuplets with the "fishbone" algorithm (only on GPUs). Other changes: - recover the Riemann fit updates lost during the merge with CMSSW 10.4.x; - speed up clustering and track fitting; - minor bug fix to avoid trivial regression with the optimized fit.
cb3e122
to
cbeb333
Compare
Rebased on top of CMSSW_10_4_0_pre4_Patatrack to fix conflicts. Note that so far I've only compiled it, and didn't test running. |
59fe318
to
db3e6f8
Compare
This PR experiments using unified memory for conditions. It adds a helper class
CUDAESManaged
to simplify calling thecudaMemAdvise(..., cudaMemAdviseSetReadMostly, 0)
andcudaMemPrefetchAsync(...)
to all allocated buffers.For the CPE and the cabling map it also experiments passing a struct of GPU pointers to the kernel instead of a GPU pointer to a struct of GPU pointers.
It also adds
CUDAManagedAllocator
andCUDAManagedVector<T>
because I thought first that I'd use them, but in the end didn't.I have not done a detailed performance evaluation wrt. the current state.