-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify SiPixelFedCablingMapGPU SoA #301
Simplify SiPixelFedCablingMapGPU SoA #301
Conversation
Looks definitively cleaner and simpler to manage (ok I am biased) |
RecoLocalTracker/SiPixelClusterizer/interface/SiPixelFedCablingMapGPU.h
Outdated
Show resolved
Hide resolved
…ead of separately allocated pointers
5bab919
to
61bb79b
Compare
hasQuality_(badPixelInfo != nullptr) | ||
{ | ||
cudaCheck(cudaMallocHost(&cablingMapHost, sizeof(SiPixelFedCablingMapGPU))); | ||
|
||
std::vector<unsigned int> const& fedIds = cablingMap.fedIds(); | ||
std::unique_ptr<SiPixelFedCablingTree> const& cabling = cablingMap.cablingTree(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not relevant for this PR, but wouldn't it be simpler to use a "dumb" pointer (SiPixelFedCablingTree const *
) instead of a const referent to a unique_ptr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it would be clearer to drop const&
and take the unique_ptr
by value as cablingMap.cablingTree()
returns the unique_ptr
by value
std::unique_ptr<SiPixelFedCablingTree> cablingTree() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for fedIds
?
cablingMap.fedIds()
return an std::vector
by value, so we could drop the const&
there as well, and let the compiler move or even optimise it away
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> RawId; | ||
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> rocInDet; | ||
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> moduleId; | ||
std::vector<unsigned char, CUDAHostAllocator<unsigned char>> badRocs; | ||
std::vector<unsigned char, CUDAHostAllocator<unsigned char>> modToUnpDefault; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modToUnpDefault_
?
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> RawId; | ||
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> rocInDet; | ||
std::vector<unsigned int, CUDAHostAllocator<unsigned int>> moduleId; | ||
std::vector<unsigned char, CUDAHostAllocator<unsigned char>> badRocs; | ||
std::vector<unsigned char, CUDAHostAllocator<unsigned char>> modToUnpDefault; | ||
unsigned int size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_
?
struct GPUData { | ||
~GPUData(); | ||
SiPixelFedCablingMapGPU *cablingMapHost = nullptr; // internal pointers are to GPU, struct itself is on CPU | ||
SiPixelFedCablingMapGPU *cablingMapDevice = nullptr; // same internal pointers as above, struct itself is on GPU | ||
SiPixelFedCablingMapGPU *cablingMapDevice = nullptr; // pointer to struct in GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last comment not relevant to this PR, but rather to CUDAESProduct
in general :-)
The pattern seems to be:
- start with a class/struct for the actual data on th GPU
Payload *cablingMapHost = nullptr; // pointer to struct in CPU
- define a wrapper
struct PayloadWrapper {
~PayloadWrapper();
Payload *payload = nullptr; // pointer to struct in GPU
};
- add a CUDAESProduct data mamber:
CUDAESProduct<PayloadWrapper> payload_;
- produce it for the gpu like this
Payload const* getGPUProductAsync(cuda::stream_t<>& cudaStream) const {
const auto& data = payload_.dataForCurrentDeviceAsync(cudaStream, [this](PayloadWrapper& data, cuda::stream_t<>& stream) {
// allocate
cudaCheck(cudaMalloc(&data.payload, sizeof(Payload)));
// transfer
cudaCheck(cudaMemcpyAsync(data.payload, this->cablingMapHost, sizeof(Payload), cudaMemcpyDefault, stream.id()));
});
return data.payload;
}
Would it make sense to encapsulate more of the common part into CUDAESProduct
?
And/or to drop the PayloadWrapper in favour of a unique_ptr, possibly with a custom destructor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all for encapsulating patterns, but I need to think this for a while. I made an issue #336 of it to remind.
I have not seen a measurable¹ impact on the throughput. V100 on JetHT dataAverage of 10 jobs running with 10 threads on a single GPU reference :
#301:
T4 on TTbar MCAverage of 10 jobs running with 8 threads on a single GPU reference :
#301:
¹ the measurements on the V100 seem to fluctuate a lot ... |
@@ -41,21 +41,21 @@ SiPixelFedCablingMapGPUWrapper::SiPixelFedCablingMapGPUWrapper(SiPixelFedCabling | |||
for (unsigned int roc = 1; roc <= pixelgpudetails::MAX_ROC; roc++) { | |||
path = {fed, link, roc}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why at line 39 we use
for (unsigned int fed = startFed; fed <= endFed; fed++) {
instead of
for (unsigned int fed: fedIds) {
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it can happen, but if fedIds
vector does not contain all the values between fedIds.front()
and fedIds.back()
, the result of those two is different.
Sure - I guess I was wondering if having all the values in between is a
requirement or not.
|
…ead of separately allocated pointers (#301)
…ead of separately allocated pointers (#301)
…ead of separately allocated pointers (#301)
…ead of separately allocated pointers (#301)
…ead of separately allocated pointers (#301)
…ead of separately allocated pointers (#301)
I wanted to dump the
SiPixelFedCablingMapGPU
to a file for some standalone testing (we can talk about that next week), and easiest was to try out the suggestion #272 (comment) since the arrays were allocated to compile-time maximum size anyway.