Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs #396

waredjeb · 2019-10-30T11:15:53Z

PR description:

This PR is part of #386, and replaces the use of CUDA API wrapper unique_ptrs: cuda::memory::device::make_unique() and cuda::memory::host::make_unique() with, respectively, cudautils::make_device_unique() and cudautils::make_host_unique(). For this purpose also the cuda::memory::device::unique_ptr() and cuda::memory::host::unique_ptr() have been replaced with, respectively, cudautils::device::unique_ptr() and cudautils::host::unique_ptr()

PR validation:

Unit tests run, code formatting was run

…e_device_unique()

…utils::device::unique_ptr()

…unique()

fwyzard · 2019-10-30T13:02:56Z

Validation summary

Reference release CMSSW_11_0_0_pre7 at 411b633
Development branch CMSSW_11_0_X_Patatrack at 8177676
Testing PRs:

Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs #396 at 54d2576

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/7f913d591ce714dae3a9b39161baaab50c6b7d8f/log .

fwyzard · 2019-10-30T15:02:44Z

No changes to physics or timing, as expected.

fwyzard · 2019-10-30T17:02:09Z

In fact, this PR touches only test files, no part of the reconstruction.

fwyzard · 2019-10-30T17:03:42Z

However, we do see some unexpected changes in the realistic TTbar workflow on GPU:

Pixel Tracks from PV

	development-10824.5	development-10824.52	testing-10824.52
Number of TrackingParticles (after cuts)	4605	4950	5017
Number of matched TrackingParticles	2346	2757	2790
Number of tracks	3410	4371	4416
Number of true tracks	3025	3860	3905
Number of fake tracks	385	511	511
Number of pileup tracks	0	0	0
Number of duplicate tracks	44	0	0

@VinInn @makortel any ideas where these may come from ?

fwyzard · 2019-10-30T17:10:31Z

Validation summary

Reference release CMSSW_11_0_0_pre7 at 411b633
Development branch CMSSW_11_0_X_Patatrack at 8177676
Testing PRs:

Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs #396 at 54d2576

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/70e19d80872591148b0e6aba071f895c808bbe08/log .

fwyzard · 2019-10-30T17:15:46Z

Let's see what happens re-running the validation...

In fact, looking at the validation of some recent PRs, the result for the "Pixel Tracks from PV" seem to alternate randomly between 4950 and 5017 tracks.

Do we have some non-perfect reproducibility in the vertexing code ?

fwyzard · 2019-10-30T17:32:40Z

DataFormats/GeometrySurface/test/gpuFrameTransformTest.cpp

@@ -51,15 +51,15 @@ int main(void) {
  float ge[6 * size];

  auto current_device = cuda::device::current::get();
-  auto d_xl = cuda::memory::device::make_unique<float[]>(current_device, size);
-  auto d_yl = cuda::memory::device::make_unique<float[]>(current_device, size);
+  auto d_xl = cudautils::make_device_unique<float[]>(size, nullptr);


Do people think it would make sense to (ab)use cudaStreamDefault instead of nullptr to speficy the default stream ?

I say "abuse" because cudaStreamDefault is meant to specify the default stream creation flags - however the name and value (0x00) would make it a good candidate...

I'm a bit afraid that the "abuse" would lead to confusion at some point.

I'm thinking (*) to add an overload on the caching allocator that would not take a stream at all (or use the nullptr to signify no-stream; although that choice would make it impossible to use the allocator with the default stream), in which case the memory block is truly freed at the destructor of the unique_ptr (instead of delaying the "true free" until the work using the memory block has finished). My main challenge is the naming of the smart pointers: using unique_ptr for both would likely be confusing (in a sense the current unique_ptr could be argued to be confusing as well).

(*) e.g. for caching memory allocations in ESProducts, and to reduce the use of CUDA events in the caching allocator

Reading up on the CUDA documentation, there are actually two options for the "default" stream:

the "legacy default stream"; this synchronises with all (not non-blocking) streams on the same device

the "per-thread default stream"; this is per-thread, and does not synchronise with other streams (except for the legacy one)

Passing 0 or nullptr will use either of those behaviours depending on the nvcc --default-stream option or the CUDA_API_PER_THREAD_DEFAULT_STREAM symbol; the default is the "legacy" stream.

Purely from the API point of view, I would use

cudautils::make_device_unique<T>(size, nullptr); for the unspecified default stream

cudautils::make_device_unique<T>(size, cudaStreamLegacy); for the legacy default stream

cudautils::make_device_unique<T>(size, cudaStreamPerThread); for the per-thread default stream

cudautils::make_device_unique<T>(size); for the synchronous behaviour

to keep the possibility of passing nullptr for the generic default stream.
With that naming scheme, cudaStreamDefault makes a lot of sense for the unspecified default stream.

My main challenge is the naming of the smart pointers: using unique_ptr for both would likely be confusing (in a sense the current unique_ptr could be argued to be confusing as well).

Then I would suggest unique_ptr and make_device_unique for the synchronous behaviour, and something like async_unique_ptr and make_device_async_unique or unique_ptr_async and make_device_unique_async for the ones that use a stream ?

Or just stick to unique_ptr...

VinInn · 2019-10-31T09:23:27Z

In fact, looking at the validation of some recent PRs, the result for the "Pixel Tracks from PV" seem to alternate randomly between 4950 and 5017 tracks. Do we have some non-perfect reproducibility in the vertexing code ?

not necessarely in the vertex code. The association to PV is a specific algorithm in Validation code (that we need to describe in the paper, btw) Matti??? v.

fwyzard · 2019-10-31T10:28:25Z

From the second round of validation:

same non reproducibility
spurious (?) impact on the throughput

Not bad for a PR that touches only test files...

…_ptrs (#396) Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique() Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()

waredjeb added 5 commits October 29, 2019 19:39

Replace cuda::memory::device::make_unique() calls with cudautils::mak…

a0e2de8

…e_device_unique()

Replace not complete, problem with std::swap

0ae4faf

Fixes problems replacing cuda::memory::device::unique_ptr() with cuda…

4da1c01

…utils::device::unique_ptr()

-Replace cuda::memory::host::make_unique() with cudautils::make_host_…

2e1dc28

…unique()

Comments cleanup

54d2576

waredjeb changed the title ~~Replace use of CUDA API wrapper unique_ptrs.~~ Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs Oct 30, 2019

fwyzard mentioned this pull request Oct 30, 2019

Understand non-perfect reproducibility in tracks to PV association #397

Open

fwyzard reviewed Oct 30, 2019

View reviewed changes

fwyzard merged commit cce3f33 into cms-patatrack:CMSSW_11_0_X_Patatrack Oct 31, 2019

makortel mentioned this pull request Oct 31, 2019

Remove the use of CUDA API wrappers #386

Closed

20 tasks

This was referenced Oct 8, 2020

Patatrack integration - Pixel local reconstruction (9/N) cms-sw/cmssw#31721

Merged

Patatrack integration - Pixel vertex reconstruction (11/N) cms-sw/cmssw#31723

Merged

fwyzard mentioned this pull request Nov 6, 2020

Patatrack integration - Pixel track reconstruction (10/N) cms-sw/cmssw#31722

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs #396

Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs #396

waredjeb commented Oct 30, 2019

fwyzard commented Oct 30, 2019 •

edited

Loading

fwyzard commented Oct 30, 2019

fwyzard commented Oct 30, 2019

fwyzard commented Oct 30, 2019

fwyzard commented Oct 30, 2019 •

edited

Loading

fwyzard commented Oct 30, 2019

fwyzard Oct 30, 2019

makortel Oct 30, 2019

fwyzard Oct 30, 2019

fwyzard Oct 31, 2019

VinInn commented Oct 31, 2019 via email

fwyzard commented Oct 31, 2019

Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs #396

Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs #396

Conversation

waredjeb commented Oct 30, 2019

PR description:

PR validation:

fwyzard commented Oct 30, 2019 • edited Loading

Validation summary

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Logs

fwyzard commented Oct 30, 2019

fwyzard commented Oct 30, 2019

fwyzard commented Oct 30, 2019

Pixel Tracks from PV

fwyzard commented Oct 30, 2019 • edited Loading

Validation summary

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Logs

fwyzard commented Oct 30, 2019

fwyzard Oct 30, 2019

Choose a reason for hiding this comment

makortel Oct 30, 2019

Choose a reason for hiding this comment

fwyzard Oct 30, 2019

Choose a reason for hiding this comment

fwyzard Oct 31, 2019

Choose a reason for hiding this comment

VinInn commented Oct 31, 2019 via email

fwyzard commented Oct 31, 2019

fwyzard commented Oct 30, 2019 •

edited

Loading

logs and `nvprof`/`nvvp` profiles

fwyzard commented Oct 30, 2019 •

edited

Loading

logs and `nvprof`/`nvvp` profiles