Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs #396

Merged
merged 5 commits into from
Oct 31, 2019
Merged

Conversation

waredjeb
Copy link

PR description:

This PR is part of #386, and replaces the use of CUDA API wrapper unique_ptrs: cuda::memory::device::make_unique() and cuda::memory::host::make_unique() with, respectively, cudautils::make_device_unique() and cudautils::make_host_unique(). For this purpose also the cuda::memory::device::unique_ptr() and cuda::memory::host::unique_ptr() have been replaced with, respectively, cudautils::device::unique_ptr() and cudautils::host::unique_ptr()

PR validation:

Unit tests run, code formatting was run

@waredjeb waredjeb changed the title Replace use of CUDA API wrapper unique_ptrs. Replace use of CUDA API wrapper unique_ptrs with CUDAUtilities unique_ptrs Oct 30, 2019
@fwyzard
Copy link

fwyzard commented Oct 30, 2019

Validation summary

Reference release CMSSW_11_0_0_pre7 at 411b633
Development branch CMSSW_11_0_X_Patatrack at 8177676
Testing PRs:

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.86452.png
zoom-136.86452.png

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/7f913d591ce714dae3a9b39161baaab50c6b7d8f/log .

@fwyzard
Copy link

fwyzard commented Oct 30, 2019

No changes to physics or timing, as expected.

@fwyzard
Copy link

fwyzard commented Oct 30, 2019

In fact, this PR touches only test files, no part of the reconstruction.

@fwyzard
Copy link

fwyzard commented Oct 30, 2019

However, we do see some unexpected changes in the realistic TTbar workflow on GPU:

Pixel Tracks from PV

  reference-10824.5 development-10824.5 development-10824.52 testing-10824.52
Number of TrackingParticles (after cuts)   4605 4950 5017
Number of matched TrackingParticles   2346 2757 2790
Number of tracks   3410 4371 4416
Number of true tracks   3025 3860 3905
Number of fake tracks   385 511 511
Number of pileup tracks   0 0 0
Number of duplicate tracks   44 0 0

@VinInn @makortel any ideas where these may come from ?

@fwyzard
Copy link

fwyzard commented Oct 30, 2019

Validation summary

Reference release CMSSW_11_0_0_pre7 at 411b633
Development branch CMSSW_11_0_X_Patatrack at 8177676
Testing PRs:

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.86452.png
zoom-136.86452.png

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/70e19d80872591148b0e6aba071f895c808bbe08/log .

@fwyzard
Copy link

fwyzard commented Oct 30, 2019

Let's see what happens re-running the validation...

In fact, looking at the validation of some recent PRs, the result for the "Pixel Tracks from PV" seem to alternate randomly between 4950 and 5017 tracks.

Do we have some non-perfect reproducibility in the vertexing code ?

@@ -51,15 +51,15 @@ int main(void) {
float ge[6 * size];

auto current_device = cuda::device::current::get();
auto d_xl = cuda::memory::device::make_unique<float[]>(current_device, size);
auto d_yl = cuda::memory::device::make_unique<float[]>(current_device, size);
auto d_xl = cudautils::make_device_unique<float[]>(size, nullptr);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do people think it would make sense to (ab)use cudaStreamDefault instead of nullptr to speficy the default stream ?

I say "abuse" because cudaStreamDefault is meant to specify the default stream creation flags - however the name and value (0x00) would make it a good candidate...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit afraid that the "abuse" would lead to confusion at some point.

I'm thinking (*) to add an overload on the caching allocator that would not take a stream at all (or use the nullptr to signify no-stream; although that choice would make it impossible to use the allocator with the default stream), in which case the memory block is truly freed at the destructor of the unique_ptr (instead of delaying the "true free" until the work using the memory block has finished). My main challenge is the naming of the smart pointers: using unique_ptr for both would likely be confusing (in a sense the current unique_ptr could be argued to be confusing as well).

(*) e.g. for caching memory allocations in ESProducts, and to reduce the use of CUDA events in the caching allocator

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading up on the CUDA documentation, there are actually two options for the "default" stream:

  • the "legacy default stream"; this synchronises with all (not non-blocking) streams on the same device
  • the "per-thread default stream"; this is per-thread, and does not synchronise with other streams (except for the legacy one)

Passing 0 or nullptr will use either of those behaviours depending on the nvcc --default-stream option or the CUDA_API_PER_THREAD_DEFAULT_STREAM symbol; the default is the "legacy" stream.

Purely from the API point of view, I would use

  • cudautils::make_device_unique<T>(size, nullptr); for the unspecified default stream
  • cudautils::make_device_unique<T>(size, cudaStreamLegacy); for the legacy default stream
  • cudautils::make_device_unique<T>(size, cudaStreamPerThread); for the per-thread default stream
  • cudautils::make_device_unique<T>(size); for the synchronous behaviour

to keep the possibility of passing nullptr for the generic default stream.
With that naming scheme, cudaStreamDefault makes a lot of sense for the unspecified default stream.

My main challenge is the naming of the smart pointers: using unique_ptr for both would likely be confusing (in a sense the current unique_ptr could be argued to be confusing as well).

Then I would suggest unique_ptr and make_device_unique for the synchronous behaviour, and something like async_unique_ptr and make_device_async_unique or unique_ptr_async and make_device_unique_async for the ones that use a stream ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just stick to unique_ptr...

@VinInn
Copy link

VinInn commented Oct 31, 2019 via email

@fwyzard
Copy link

fwyzard commented Oct 31, 2019

From the second round of validation:

  • same non reproducibility
  • spurious (?) impact on the throughput

Not bad for a PR that touches only test files...

@fwyzard fwyzard merged commit cce3f33 into cms-patatrack:CMSSW_11_0_X_Patatrack Oct 31, 2019
@makortel makortel mentioned this pull request Oct 31, 2019
20 tasks
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Oct 19, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard added a commit that referenced this pull request Nov 27, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard added a commit that referenced this pull request Dec 26, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Jan 13, 2021
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
fwyzard pushed a commit that referenced this pull request Jan 15, 2021
…_ptrs (#396)

Replace cuda::memory::device::make_unique() calls with cudautils::make_device_unique()
Replace cuda::memory::host::make_unique() with cudautils::make_host_unique()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants