Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify SiPixelFedCablingMapGPU SoA #301

Merged

Conversation

makortel
Copy link

I wanted to dump the SiPixelFedCablingMapGPU to a file for some standalone testing (we can talk about that next week), and easiest was to try out the suggestion #272 (comment) since the arrays were allocated to compile-time maximum size anyway.

@fwyzard fwyzard added the Pixels Pixels-related developments label Mar 26, 2019
@VinInn
Copy link

VinInn commented Mar 28, 2019

Looks definitively cleaner and simpler to manage (ok I am biased)

@makortel makortel changed the base branch from CMSSW_10_5_X_Patatrack to CMSSW_10_6_X_Patatrack April 24, 2019 15:54
hasQuality_(badPixelInfo != nullptr)
{
cudaCheck(cudaMallocHost(&cablingMapHost, sizeof(SiPixelFedCablingMapGPU)));

std::vector<unsigned int> const& fedIds = cablingMap.fedIds();
std::unique_ptr<SiPixelFedCablingTree> const& cabling = cablingMap.cablingTree();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not relevant for this PR, but wouldn't it be simpler to use a "dumb" pointer (SiPixelFedCablingTree const *) instead of a const referent to a unique_ptr ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it would be clearer to drop const& and take the unique_ptr by value as cablingMap.cablingTree() returns the unique_ptr by value

std::unique_ptr<SiPixelFedCablingTree> cablingTree() const;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for fedIds ?
cablingMap.fedIds() return an std::vector by value, so we could drop the const& there as well, and let the compiler move or even optimise it away

std::vector<unsigned int, CUDAHostAllocator<unsigned int>> RawId;
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> rocInDet;
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> moduleId;
std::vector<unsigned char, CUDAHostAllocator<unsigned char>> badRocs;
std::vector<unsigned char, CUDAHostAllocator<unsigned char>> modToUnpDefault;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modToUnpDefault_ ?

std::vector<unsigned int, CUDAHostAllocator<unsigned int>> RawId;
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> rocInDet;
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> moduleId;
std::vector<unsigned char, CUDAHostAllocator<unsigned char>> badRocs;
std::vector<unsigned char, CUDAHostAllocator<unsigned char>> modToUnpDefault;
unsigned int size;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_ ?

struct GPUData {
~GPUData();
SiPixelFedCablingMapGPU *cablingMapHost = nullptr; // internal pointers are to GPU, struct itself is on CPU
SiPixelFedCablingMapGPU *cablingMapDevice = nullptr; // same internal pointers as above, struct itself is on GPU
SiPixelFedCablingMapGPU *cablingMapDevice = nullptr; // pointer to struct in GPU
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last comment not relevant to this PR, but rather to CUDAESProduct in general :-)

The pattern seems to be:

  • start with a class/struct for the actual data on th GPU
  Payload *cablingMapHost = nullptr; // pointer to struct in CPU
  • define a wrapper
  struct PayloadWrapper {
    ~PayloadWrapper();
    Payload *payload = nullptr; // pointer to struct in GPU
  };
  • add a CUDAESProduct data mamber:
  CUDAESProduct<PayloadWrapper> payload_;
  • produce it for the gpu like this
Payload const* getGPUProductAsync(cuda::stream_t<>& cudaStream) const {
  const auto& data = payload_.dataForCurrentDeviceAsync(cudaStream, [this](PayloadWrapper& data, cuda::stream_t<>& stream) {
      // allocate
      cudaCheck(cudaMalloc(&data.payload, sizeof(Payload)));

      // transfer
      cudaCheck(cudaMemcpyAsync(data.payload, this->cablingMapHost, sizeof(Payload), cudaMemcpyDefault, stream.id()));
  });
  return data.payload;
}

Would it make sense to encapsulate more of the common part into CUDAESProduct ?

And/or to drop the PayloadWrapper in favour of a unique_ptr, possibly with a custom destructor ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for encapsulating patterns, but I need to think this for a while. I made an issue #336 of it to remind.

@fwyzard fwyzard merged commit 30e75bb into cms-patatrack:CMSSW_10_6_X_Patatrack Apr 29, 2019
@fwyzard
Copy link

fwyzard commented Apr 29, 2019

I have not seen a measurable¹ impact on the throughput.

V100 on JetHT data

Average of 10 jobs running with 10 threads on a single GPU

reference :

1689.1 ±  24.8 ev/s
1674.2 ±  11.9 ev/s

#301:

1674.6 ±   7.6 ev/s

T4 on TTbar MC

Average of 10 jobs running with 8 threads on a single GPU

reference :

 720.9 ±   1.3 ev/s
 722.3 ±   1.1 ev/s

#301:

 719.9 ±   0.7 ev/s

¹ the measurements on the V100 seem to fluctuate a lot ...

@@ -41,21 +41,21 @@ SiPixelFedCablingMapGPUWrapper::SiPixelFedCablingMapGPUWrapper(SiPixelFedCabling
for (unsigned int roc = 1; roc <= pixelgpudetails::MAX_ROC; roc++) {
path = {fed, link, roc};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why at line 39 we use

for (unsigned int fed = startFed; fed <= endFed; fed++) {

instead of

for (unsigned int fed: fedIds) {

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it can happen, but if fedIds vector does not contain all the values between fedIds.front() and fedIds.back(), the result of those two is different.

@fwyzard
Copy link

fwyzard commented Apr 29, 2019 via email

fwyzard pushed a commit that referenced this pull request Oct 8, 2020
fwyzard pushed a commit that referenced this pull request Oct 19, 2020
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pixels Pixels-related developments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants