-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that CCL output is contiguous on modules #666
base: main
Are you sure you want to change the base?
Ensure that CCL output is contiguous on modules #666
Conversation
Sorting small arrays is a relatively common problem in GPGPU programming. Many useful algorithms exist, and some are provided by libraries like CUB. An algorithm close to my heart is odd-even sort because it is exceedingly simply, relatively efficient for small arrays and, importantly, it uses O(1) space. This commit adds new implementations of this sorting algorithm for block-wide odd-even sort in a portable way.
The current CCL kernels have so many parameters that it's a real pain in the rear to maintain them and to make changes to them. This commit reduces the number of parameters a little bit by taking all statically-known shared memory data and unifying it into a single struct which can be passed around more easily.
Right now we are using a sorting algorithm to ensure that the entire array of measurements is contiguous in global memory, but this isn't strictly necessary. This commit alters the CCL algorithm slightly to guarantee that the output is always contiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the actual CCL code I'll go with your judgement here. Nothing jumps out to me as code that I wouldn't like, but I also didn't try to understand exactly what is being done. 😦
It doesn't have to be this PR, but I guess we'll want to remove the sorting algorithm(s) from the example applications then. I'm fine with that happening in a separate PR, though you should definitely test at least locally that track finding/fitting would take this sorting well. 🤔
#ifndef NDEBUG | ||
TRACCC_CUDA_ERROR_CHECK(cudaStreamSynchronize(stream)); | ||
assert(is_contiguous_on(measurement_module_projection(), m_mr.main, m_copy, | ||
m_stream, measurements)); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really not an important point, but would it not be possible to express this with something like:
assert([&]() -> bool {
TRACCC_CUDA_ERROR_CHECK(cudaStreamSynchronize(stream));
return is_contiguous_on(measurement_module_projection(), m_mr.main, m_copy,
m_stream, measurements); }() == true);
? I'd just prefer to use NDEBUG
directly if I can help it...
Indeed, we should disable the sorting application. I was thinking of adding some boolean flag to the output to indicate whether or not the output is contiguous, but I am not sure if it's worth the hassle. |
No, there's no need to represent this in the EDM. As long as the documentation on the algorithms is clear, it's fine to just know whether extra sorting is needed on something or not. |
Once this works. could you also remove the |
I would not want to do that! Yes, we will need to remove that / those algorithm(s) from the standard algorithm chain. But having an algorithm in the repository that is not actively used at a specific time (but is otherwise functional), is not a bad thing. I want to keep those algorithms around. We may make use of them in some other way later on, who knows. 🤔 |
OK that makes sense |
Right now we are using a sorting algorithm to ensure that the entire array of measurements is contiguous in global memory, but this isn't strictly necessary. This commit alters the CCL algorithm slightly to guarantee that the output is always contiguous.