Ensure that CCL output is contiguous on modules #666

stephenswat · 2024-08-02T14:06:31Z

Right now we are using a sorting algorithm to ensure that the entire array of measurements is contiguous in global memory, but this isn't strictly necessary. This commit alters the CCL algorithm slightly to guarantee that the output is always contiguous.

Sorting small arrays is a relatively common problem in GPGPU programming. Many useful algorithms exist, and some are provided by libraries like CUB. An algorithm close to my heart is odd-even sort because it is exceedingly simply, relatively efficient for small arrays and, importantly, it uses O(1) space. This commit adds new implementations of this sorting algorithm for block-wide odd-even sort in a portable way.

The current CCL kernels have so many parameters that it's a real pain in the rear to maintain them and to make changes to them. This commit reduces the number of parameters a little bit by taking all statically-known shared memory data and unifying it into a single struct which can be passed around more easily.

Right now we are using a sorting algorithm to ensure that the entire array of measurements is contiguous in global memory, but this isn't strictly necessary. This commit alters the CCL algorithm slightly to guarantee that the output is always contiguous.

stephenswat · 2024-08-02T14:07:00Z

Depends on #552, #665. I will do some performance testing on this.

krasznaa

With the actual CCL code I'll go with your judgement here. Nothing jumps out to me as code that I wouldn't like, but I also didn't try to understand exactly what is being done. 😦

It doesn't have to be this PR, but I guess we'll want to remove the sorting algorithm(s) from the example applications then. I'm fine with that happening in a separate PR, though you should definitely test at least locally that track finding/fitting would take this sorting well. 🤔

krasznaa · 2024-08-02T14:15:08Z

device/cuda/src/clusterization/clusterization_algorithm.cu

+#ifndef NDEBUG
+    TRACCC_CUDA_ERROR_CHECK(cudaStreamSynchronize(stream));
+    assert(is_contiguous_on(measurement_module_projection(), m_mr.main, m_copy,
+                            m_stream, measurements));
+#endif


This is really not an important point, but would it not be possible to express this with something like:

assert([&]() -> bool { TRACCC_CUDA_ERROR_CHECK(cudaStreamSynchronize(stream)); return is_contiguous_on(measurement_module_projection(), m_mr.main, m_copy, m_stream, measurements); }() == true);

? I'd just prefer to use NDEBUG directly if I can help it...

stephenswat · 2024-08-02T14:21:10Z

It doesn't have to be this PR, but I guess we'll want to remove the sorting algorithm(s) from the example applications then. I'm fine with that happening in a separate PR, though you should definitely test at least locally that track finding/fitting would take this sorting well. 🤔

Indeed, we should disable the sorting application. I was thinking of adding some boolean flag to the output to indicate whether or not the output is contiguous, but I am not sure if it's worth the hassle.

krasznaa · 2024-08-02T14:25:23Z

No, there's no need to represent this in the EDM. As long as the documentation on the algorithms is clear, it's fine to just know whether extra sorting is needed on something or not.

beomki-yeo · 2024-08-09T00:34:10Z

Once this works. could you also remove the measurement_sorting_algorithm?

krasznaa · 2024-08-09T07:39:10Z

Once this works. could you also remove the measurement_sorting_algorithm?

I would not want to do that! Yes, we will need to remove that / those algorithm(s) from the standard algorithm chain. But having an algorithm in the repository that is not actively used at a specific time (but is otherwise functional), is not a bad thing. I want to keep those algorithms around. We may make use of them in some other way later on, who knows. 🤔

beomki-yeo · 2024-08-09T16:03:56Z

OK that makes sense

stephenswat added 4 commits August 1, 2024 17:01

Merge branch 'refactor/ccl_smem_parcel' into workcont

3b7ce40

stephenswat added improvement Improve an existing feature shared Changes related to shared code labels Aug 2, 2024

stephenswat requested review from krasznaa and beomki-yeo August 2, 2024 14:06

krasznaa reviewed Aug 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that CCL output is contiguous on modules #666

Ensure that CCL output is contiguous on modules #666

stephenswat commented Aug 2, 2024

stephenswat commented Aug 2, 2024

krasznaa left a comment

krasznaa Aug 2, 2024

stephenswat commented Aug 2, 2024

krasznaa commented Aug 2, 2024

beomki-yeo commented Aug 9, 2024

krasznaa commented Aug 9, 2024

beomki-yeo commented Aug 9, 2024

Ensure that CCL output is contiguous on modules #666

Are you sure you want to change the base?

Ensure that CCL output is contiguous on modules #666

Conversation

stephenswat commented Aug 2, 2024

stephenswat commented Aug 2, 2024

krasznaa left a comment

Choose a reason for hiding this comment

krasznaa Aug 2, 2024

Choose a reason for hiding this comment

stephenswat commented Aug 2, 2024

krasznaa commented Aug 2, 2024

beomki-yeo commented Aug 9, 2024

krasznaa commented Aug 9, 2024

beomki-yeo commented Aug 9, 2024