-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use unsigned int
instead of size_t
for seed finding
#82
Conversation
I didn't know that There is a bit of "controversy" around this in vecmem as well. (acts-project/vecmem#96) So in general I'm on your side, |
CUDA devices don't have any 64-bit integer hardware as far as I know, so they emulate it using their 32-bit integer silicon. The amount of time that takes is very dependent on what operations you are trying to run. Addition and subtraction are trivial (just perform two 32-bit operations and carry), multiplication is a little more expensive, stuff like division can be quite pricey. So it depends on your workload how much slower it is to use 64-bit integers compared to 32-bit ones. I would normally say that for the usual floating point-heavy workloads it doesn't matter at all, but for irregular workloads like seedfinding it could feasibly make a difference. |
I think |
Are you sure? Every change I see in this PR looks like it would affect the register file, not the global memory. But of course I am not intimately familiar with your seeding implementation. So if you say this is faster then I'll take your word for it. 🙂 |
|
interesting question - what determines the size of |
It is |
I'm pretty sure that nvcc still supports 32-bit architectures (both on the host side and the device side). As far as I know both compilers will import their own implementation of the standard library headers, and then decide on the size depending on whether the compilation is 32-bit or 64-bit. I strongly suspect that this is decided by the host side, although that allows you to come up with some very esoteric scenarios where your host is 32-bit and your device is 64-bit. Since nvcc is a compiler driver and not a compiler itself it may in that case decide to use 64-bit integers for its host code and its device code, while the actual host compiler might default to 32-bits. In reality 64-bit computing is so ubiquitous now that I don't think this is ever a problem, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As interesting as this discussion has been I think we're almost good to go ahead, I have one quick question about one of the container changes. Once that's been solved I am happy to merge.
size_t
in cuda device code is pretty expensive, and I don't think it is necessary to usesize_t
because the number of spacepoints/doublet/triplet or bin size never exceeds maximum values ofunsigned int
There are meaningful difference in speedup when we use
unsigned int