[RFC] Add make_device_unique() functions to ScopedContextBase #487

makortel · 2020-06-16T15:50:27Z

PR description:

This PR is to demonstrate how the CUDA memory allocation API would change if moved part of cms::cuda::ScopedContext*. Such an API change would allow

moving the caching allocators back to be owned by CUDAService instead of being singletons
- them being singletons adds some complexity in the destructor of CUDAService
reducing the number CUDA events from one per allocation to one per EDModule
- cost is that the CUDA event for temporary allocations may get recorded later than now (at the end of acquire()/produce()/intermediate task instead of at the end of enclosing scope), and such memory blocks may thus become available for other CUDA streams later, which may increase the live GPU memory size
- less CUDA events means less CUDA API calls, which means less locking on the CUDA mutex
not having to introduce a distinction between temporary and event product memory allocations (as in [RFC] Reduce calls to cudaEventRecord() via the caching allocators #412)

The API change implies that ScopedContextBase needs to be percolated to anywhere memory needs is allocated.

For ESProducers an ScopedContextES would need to be created (I was planning to create one anyway, and might have made a private prototype on top of #412.

PR validation:

Code compiles, test configuration HeterogeneousCore/CUDATest/test/testCUDASwitch_cfg.py runs.

…terface

makortel · 2020-06-16T15:52:35Z

This is a minimal work for to demonstrate the API change. If this is the way we want to go, I'd propose to quickly add the pinned host allocation API, and migrate the pixel code. The ECAL and HCAL code could then be directly start using the this API.

In the meantime I would be able to work on the internals towards the goals mentioned in the description.

makortel · 2020-06-16T17:45:39Z

HeterogeneousCore/CUDACore/interface/ScopedContextBase.h

+  namespace cuda {
+    class ProductBase;
+
+    class ScopedContextBase {


If this (base) class gets exposed to users, I'm tempted to rename it to just ScopedContext. (Even if I would then have to figure out what to do for current ScopedContext.h. One option is to merge the headers back together and make it C++14-compatible).

Add make_device_unique() functions to ScopedContextBase to toy the in…

562e6e6

…terface

makortel commented Jun 16, 2020

View reviewed changes

makortel mentioned this pull request Jul 8, 2020

Multigpu hcal #498

Closed

makortel mentioned this pull request Jul 19, 2020

Improve CUDAMonitoringService #506

Merged

fwyzard added the question label Jul 31, 2020

makortel mentioned this pull request Oct 10, 2020

cudaErrorIllegalAddress, possibly related to the CachingDeviceAllocator #306

Open

makortel mentioned this pull request Sep 3, 2021

[cudadev] Improve caching allocator performance cms-patatrack/pixeltrack-standalone#218

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Add make_device_unique() functions to ScopedContextBase #487

[RFC] Add make_device_unique() functions to ScopedContextBase #487

makortel commented Jun 16, 2020 •

edited

Loading

makortel commented Jun 16, 2020

makortel Jun 16, 2020

[RFC] Add make_device_unique() functions to ScopedContextBase #487

Are you sure you want to change the base?

[RFC] Add make_device_unique() functions to ScopedContextBase #487

Conversation

makortel commented Jun 16, 2020 • edited Loading

PR description:

PR validation:

makortel commented Jun 16, 2020

makortel Jun 16, 2020

Choose a reason for hiding this comment

makortel commented Jun 16, 2020 •

edited

Loading