Simplify SiPixelFedCablingMapGPU SoA #301

makortel · 2019-03-25T16:14:38Z

I wanted to dump the SiPixelFedCablingMapGPU to a file for some standalone testing (we can talk about that next week), and easiest was to try out the suggestion #272 (comment) since the arrays were allocated to compile-time maximum size anyway.

VinInn · 2019-03-28T10:10:09Z

Looks definitively cleaner and simpler to manage (ok I am biased)

RecoLocalTracker/SiPixelClusterizer/interface/SiPixelFedCablingMapGPU.h

…ead of separately allocated pointers

fwyzard · 2019-04-29T13:16:23Z

RecoLocalTracker/SiPixelClusterizer/src/SiPixelFedCablingMapGPUWrapper.cc

  hasQuality_(badPixelInfo != nullptr)
 {
+  cudaCheck(cudaMallocHost(&cablingMapHost, sizeof(SiPixelFedCablingMapGPU)));
+
  std::vector<unsigned int> const& fedIds = cablingMap.fedIds();
  std::unique_ptr<SiPixelFedCablingTree> const& cabling = cablingMap.cablingTree();


not relevant for this PR, but wouldn't it be simpler to use a "dumb" pointer (SiPixelFedCablingTree const *) instead of a const referent to a unique_ptr ?

I'd say it would be clearer to drop const& and take the unique_ptr by value as cablingMap.cablingTree() returns the unique_ptr by value

cmssw/CondFormats/SiPixelObjects/interface/SiPixelFedCablingMap.h

Line 35 in 30e75bb

std::unique_ptr<SiPixelFedCablingTree> cablingTree() const;

same for fedIds ?
cablingMap.fedIds() return an std::vector by value, so we could drop the const& there as well, and let the compiler move or even optimise it away

fwyzard · 2019-04-29T13:17:42Z

RecoLocalTracker/SiPixelClusterizer/interface/SiPixelFedCablingMapGPUWrapper.h

-  std::vector<unsigned int,  CUDAHostAllocator<unsigned int>>  RawId;
-  std::vector<unsigned int,  CUDAHostAllocator<unsigned int>>  rocInDet;
-  std::vector<unsigned int,  CUDAHostAllocator<unsigned int>>  moduleId;
-  std::vector<unsigned char, CUDAHostAllocator<unsigned char>> badRocs;
  std::vector<unsigned char, CUDAHostAllocator<unsigned char>> modToUnpDefault;


modToUnpDefault_ ?

fwyzard · 2019-04-29T13:17:52Z

RecoLocalTracker/SiPixelClusterizer/interface/SiPixelFedCablingMapGPUWrapper.h

-  std::vector<unsigned int,  CUDAHostAllocator<unsigned int>>  RawId;
-  std::vector<unsigned int,  CUDAHostAllocator<unsigned int>>  rocInDet;
-  std::vector<unsigned int,  CUDAHostAllocator<unsigned int>>  moduleId;
-  std::vector<unsigned char, CUDAHostAllocator<unsigned char>> badRocs;
  std::vector<unsigned char, CUDAHostAllocator<unsigned char>> modToUnpDefault;
  unsigned int size;


fwyzard · 2019-04-29T13:45:09Z

RecoLocalTracker/SiPixelClusterizer/interface/SiPixelFedCablingMapGPUWrapper.h

  struct GPUData {
    ~GPUData();
-    SiPixelFedCablingMapGPU *cablingMapHost = nullptr;   // internal pointers are to GPU, struct itself is on CPU
-    SiPixelFedCablingMapGPU *cablingMapDevice = nullptr; // same internal pointers as above, struct itself is on GPU
+    SiPixelFedCablingMapGPU *cablingMapDevice = nullptr; // pointer to struct in GPU


last comment not relevant to this PR, but rather to CUDAESProduct in general :-)

The pattern seems to be:

start with a class/struct for the actual data on th GPU

Payload *cablingMapHost = nullptr; // pointer to struct in CPU

define a wrapper

struct PayloadWrapper { ~PayloadWrapper(); Payload *payload = nullptr; // pointer to struct in GPU };

add a CUDAESProduct data mamber:

CUDAESProduct<PayloadWrapper> payload_;

produce it for the gpu like this

Payload const* getGPUProductAsync(cuda::stream_t<>& cudaStream) const { const auto& data = payload_.dataForCurrentDeviceAsync(cudaStream, [this](PayloadWrapper& data, cuda::stream_t<>& stream) { // allocate cudaCheck(cudaMalloc(&data.payload, sizeof(Payload))); // transfer cudaCheck(cudaMemcpyAsync(data.payload, this->cablingMapHost, sizeof(Payload), cudaMemcpyDefault, stream.id())); }); return data.payload; }

Would it make sense to encapsulate more of the common part into CUDAESProduct ?

And/or to drop the PayloadWrapper in favour of a unique_ptr, possibly with a custom destructor ?

I'm all for encapsulating patterns, but I need to think this for a while. I made an issue #336 of it to remind.

fwyzard · 2019-04-29T16:45:53Z

I have not seen a measurable¹ impact on the throughput.

V100 on JetHT data

Average of 10 jobs running with 10 threads on a single GPU

reference :

1689.1 ±  24.8 ev/s
1674.2 ±  11.9 ev/s

#301:

1674.6 ±   7.6 ev/s

T4 on TTbar MC

Average of 10 jobs running with 8 threads on a single GPU

reference :

 720.9 ±   1.3 ev/s
 722.3 ±   1.1 ev/s

#301:

 719.9 ±   0.7 ev/s

¹ the measurements on the V100 seem to fluctuate a lot ...

fwyzard · 2019-04-29T21:52:36Z

RecoLocalTracker/SiPixelClusterizer/src/SiPixelFedCablingMapGPUWrapper.cc

@@ -41,21 +41,21 @@ SiPixelFedCablingMapGPUWrapper::SiPixelFedCablingMapGPUWrapper(SiPixelFedCabling
      for (unsigned int roc = 1; roc <= pixelgpudetails::MAX_ROC; roc++) {
        path = {fed, link, roc};


is there a reason why at line 39 we use

for (unsigned int fed = startFed; fed <= endFed; fed++) {

instead of

for (unsigned int fed: fedIds) {

?

I don't know if it can happen, but if fedIds vector does not contain all the values between fedIds.front() and fedIds.back(), the result of those two is different.

fwyzard · 2019-04-29T22:15:47Z

Sure - I guess I was wondering if having all the values in between is a requirement or not.

…ead of separately allocated pointers (#301)

fwyzard added the Pixels Pixels-related developments label Mar 26, 2019

fwyzard reviewed Apr 17, 2019

View reviewed changes

RecoLocalTracker/SiPixelClusterizer/interface/SiPixelFedCablingMapGPU.h Outdated Show resolved Hide resolved

Make SiPixelFedCablingMapGPU a struct of 128-byte aligned arrays inst…

61bb79b

…ead of separately allocated pointers

makortel force-pushed the patatrackCablingMap branch from 5bab919 to 61bb79b Compare April 24, 2019 15:53

makortel changed the base branch from CMSSW_10_5_X_Patatrack to CMSSW_10_6_X_Patatrack April 24, 2019 15:54

fwyzard reviewed Apr 29, 2019

View reviewed changes

fwyzard merged commit 30e75bb into cms-patatrack:CMSSW_10_6_X_Patatrack Apr 29, 2019

fwyzard reviewed Apr 29, 2019

View reviewed changes

fwyzard pushed a commit that referenced this pull request Oct 8, 2020

Make SiPixelFedCablingMapGPU a struct of 128-byte aligned arrays inst…

d62fd14

…ead of separately allocated pointers (#301)

fwyzard mentioned this pull request Oct 8, 2020

Patatrack integration - Pixel local reconstruction (9/N) cms-sw/cmssw#31721

Merged

fwyzard pushed a commit that referenced this pull request Oct 19, 2020

Make SiPixelFedCablingMapGPU a struct of 128-byte aligned arrays inst…

633bd3e

…ead of separately allocated pointers (#301)

fwyzard pushed a commit that referenced this pull request Oct 20, 2020

Make SiPixelFedCablingMapGPU a struct of 128-byte aligned arrays inst…

4949f8d

…ead of separately allocated pointers (#301)

fwyzard pushed a commit that referenced this pull request Oct 23, 2020

Make SiPixelFedCablingMapGPU a struct of 128-byte aligned arrays inst…

938ea28

…ead of separately allocated pointers (#301)

fwyzard pushed a commit that referenced this pull request Nov 6, 2020

Make SiPixelFedCablingMapGPU a struct of 128-byte aligned arrays inst…

7d430f2

…ead of separately allocated pointers (#301)

fwyzard pushed a commit that referenced this pull request Nov 16, 2020

Make SiPixelFedCablingMapGPU a struct of 128-byte aligned arrays inst…

b1f9f35

…ead of separately allocated pointers (#301)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify SiPixelFedCablingMapGPU SoA #301

Simplify SiPixelFedCablingMapGPU SoA #301

makortel commented Mar 25, 2019

VinInn commented Mar 28, 2019

fwyzard Apr 29, 2019

makortel Apr 29, 2019

fwyzard Apr 29, 2019

fwyzard Apr 29, 2019

fwyzard Apr 29, 2019

fwyzard Apr 29, 2019

makortel Apr 29, 2019

fwyzard commented Apr 29, 2019 •

edited

Loading

fwyzard Apr 29, 2019

makortel Apr 29, 2019

fwyzard commented Apr 29, 2019 via email

		@@ -41,21 +41,21 @@ SiPixelFedCablingMapGPUWrapper::SiPixelFedCablingMapGPUWrapper(SiPixelFedCabling
		for (unsigned int roc = 1; roc <= pixelgpudetails::MAX_ROC; roc++) {
		path = {fed, link, roc};

Simplify SiPixelFedCablingMapGPU SoA #301

Simplify SiPixelFedCablingMapGPU SoA #301

Conversation

makortel commented Mar 25, 2019

VinInn commented Mar 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Apr 29, 2019 • edited Loading

V100 on JetHT data

reference :

#301:

T4 on TTbar MC

reference :

#301:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Apr 29, 2019 via email

fwyzard commented Apr 29, 2019 •

edited

Loading