use `unsigned int` instead of `size_t` for seed finding #82

beomki-yeo · 2021-09-01T18:58:50Z

size_t in cuda device code is pretty expensive, and I don't think it is necessary to use size_t because the number of spacepoints/doublet/triplet or bin size never exceeds maximum values of unsigned int

There are meaningful difference in speedup when we use unsigned int

krasznaa · 2021-09-02T07:26:09Z

I didn't know that size_t would affect performance so much if you didn't want to use atomic operations on it. 😕 Because yes, unsigned int is definitely a much safer type if you want to use any CUDA-provided functions.

There is a bit of "controversy" around this in vecmem as well. (acts-project/vecmem#96) So in general I'm on your side, unsigned int is generally a good choice in device code. I just didn't know that in these cases there would be an actual performance difference. But I'll trust you on that one...

stephenswat · 2021-09-02T12:55:25Z

CUDA devices don't have any 64-bit integer hardware as far as I know, so they emulate it using their 32-bit integer silicon. The amount of time that takes is very dependent on what operations you are trying to run. Addition and subtraction are trivial (just perform two 32-bit operations and carry), multiplication is a little more expensive, stuff like division can be quite pricey.

So it depends on your workload how much slower it is to use 64-bit integers compared to 32-bit ones. I would normally say that for the usual floating point-heavy workloads it doesn't matter at all, but for irregular workloads like seedfinding it could feasibly make a difference.

beomki-yeo · 2021-09-02T13:14:28Z

I think size_t is slower than unsigned int when it comes to global memory writing due to its larger size.
As you can see, the changes I made in this PR are only for the indices, which are written into global memory during the seed finding. and it is also true that global memory writing time is significant in cuda seed finding. I don't think they affect the computation meaningfully.

stephenswat · 2021-09-02T13:22:32Z

Are you sure? Every change I see in this PR looks like it would affect the register file, not the global memory. But of course I am not intimately familiar with your seeding implementation. So if you say this is faster then I'll take your word for it. 🙂

beomki-yeo · 2021-09-02T13:28:57Z

sp_location with indices (which is member of doublet and triplet) is written into global memory, and that's why

cgleggett · 2021-09-02T14:11:15Z

interesting question - what determines the size of size_t on a cuda device? What happens if the host is 32-bit? Is it the size of size_t on the host that sets it, or is it something defined by the architecture of the device itself?

krasznaa · 2021-09-02T14:14:25Z

interesting question - what determines the size of size_t on a cuda device? What happens if the host is 32-bit? Is it the size of size_t on the host that sets it, or is it something defined by the architecture of the device itself?

It is nvcc that needs to decide what std::size_t is. (Though it does need to be compatible with the host compiler along the way...) I think CUDA stopped supporting 32-bit hosts a while ago. (Didn't it...?) All in all, I'm pretty sure that std::size_t is 64-bit-wide in any reasonable situations these days.

stephenswat · 2021-09-02T15:39:46Z

I'm pretty sure that nvcc still supports 32-bit architectures (both on the host side and the device side). As far as I know both compilers will import their own implementation of the standard library headers, and then decide on the size depending on whether the compilation is 32-bit or 64-bit. I strongly suspect that this is decided by the host side, although that allows you to come up with some very esoteric scenarios where your host is 32-bit and your device is 64-bit. Since nvcc is a compiler driver and not a compiler itself it may in that case decide to use 64-bit integers for its host code and its device code, while the actual host compiler might default to 32-bits.

In reality 64-bit computing is so ubiquitous now that I don't think this is ever a problem, though.

krasznaa · 2021-09-02T16:06:39Z

I'm 99.99% sure that 32-bit OS support from CUDA was dropped a long-long time ago...

I'd have to double-check, but probably around CUDA 6 or 7...

stephenswat

As interesting as this discussion has been I think we're almost good to go ahead, I have one quick question about one of the container changes. Once that's been solved I am happy to merge.

core/include/edm/container.hpp

beomki-yeo added 2 commits September 1, 2021 11:56

use unsigned int instead of size_t

c6cb2d2

clang format

4f1c429

stephenswat reviewed Sep 6, 2021

View reviewed changes

core/include/edm/container.hpp Show resolved Hide resolved

stephenswat added the refactor Change the structure of the code label Sep 6, 2021

Merge branch 'main' into use_32bit

02cec98

stephenswat approved these changes Sep 6, 2021

View reviewed changes

stephenswat merged commit 27d66b1 into acts-project:main Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use `unsigned int` instead of `size_t` for seed finding #82

use `unsigned int` instead of `size_t` for seed finding #82

beomki-yeo commented Sep 1, 2021

krasznaa commented Sep 2, 2021

stephenswat commented Sep 2, 2021

beomki-yeo commented Sep 2, 2021

stephenswat commented Sep 2, 2021

beomki-yeo commented Sep 2, 2021

cgleggett commented Sep 2, 2021

krasznaa commented Sep 2, 2021

stephenswat commented Sep 2, 2021

krasznaa commented Sep 2, 2021

stephenswat left a comment

use unsigned int instead of size_t for seed finding #82

use unsigned int instead of size_t for seed finding #82

Conversation

beomki-yeo commented Sep 1, 2021

krasznaa commented Sep 2, 2021

stephenswat commented Sep 2, 2021

beomki-yeo commented Sep 2, 2021

stephenswat commented Sep 2, 2021

beomki-yeo commented Sep 2, 2021

cgleggett commented Sep 2, 2021

krasznaa commented Sep 2, 2021

stephenswat commented Sep 2, 2021

krasznaa commented Sep 2, 2021

stephenswat left a comment

Choose a reason for hiding this comment

use `unsigned int` instead of `size_t` for seed finding #82

use `unsigned int` instead of `size_t` for seed finding #82