Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add make_device_unique() functions to ScopedContextBase #487

Draft
wants to merge 1 commit into
base: CMSSW_11_1_X_Patatrack
Choose a base branch
from

Conversation

makortel
Copy link

@makortel makortel commented Jun 16, 2020

PR description:

This PR is to demonstrate how the CUDA memory allocation API would change if moved part of cms::cuda::ScopedContext*. Such an API change would allow

  • moving the caching allocators back to be owned by CUDAService instead of being singletons
    • them being singletons adds some complexity in the destructor of CUDAService
  • reducing the number CUDA events from one per allocation to one per EDModule
    • cost is that the CUDA event for temporary allocations may get recorded later than now (at the end of acquire()/produce()/intermediate task instead of at the end of enclosing scope), and such memory blocks may thus become available for other CUDA streams later, which may increase the live GPU memory size
    • less CUDA events means less CUDA API calls, which means less locking on the CUDA mutex
  • not having to introduce a distinction between temporary and event product memory allocations (as in [RFC] Reduce calls to cudaEventRecord() via the caching allocators #412)

The API change implies that ScopedContextBase needs to be percolated to anywhere memory needs is allocated.

For ESProducers an ScopedContextES would need to be created (I was planning to create one anyway, and might have made a private prototype on top of #412.

PR validation:

Code compiles, test configuration HeterogeneousCore/CUDATest/test/testCUDASwitch_cfg.py runs.

@makortel
Copy link
Author

This is a minimal work for to demonstrate the API change. If this is the way we want to go, I'd propose to quickly add the pinned host allocation API, and migrate the pixel code. The ECAL and HCAL code could then be directly start using the this API.

In the meantime I would be able to work on the internals towards the goals mentioned in the description.

namespace cuda {
class ProductBase;

class ScopedContextBase {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this (base) class gets exposed to users, I'm tempted to rename it to just ScopedContext. (Even if I would then have to figure out what to do for current ScopedContext.h. One option is to merge the headers back together and make it C++14-compatible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants