You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, it's worth pointing out that being able to fit an entire chunk into GPU memory is not that common, so this issue is not very relevant to processing big data - it was discovered when running on the small test data that easily fits into GPU memory.
On commit 930e0da in #446, running tomo_standard.nxs with the pipeline_gpu1.yaml pipeline one can see that the small test data gets split into two blocks (whereas if padding is switched off for remove_outlier in the associated methods database YAML file there is only one block):
(base) root@492a2a3538d1:/httomo# python -m httomo run tests/test_data/tomo_standard.nxs tests/samples/pipeline_template_examples/pipeline_gpu1.yaml output_dir/
Pipeline has been separated into 2 sections
See the full log file at: output_dir/19-09-2024_09_44_59_output/user.log
Running loader (pattern=projection): standard_tomo...
Finished loader: standard_tomo (httomo) Took 37.68ms
Section 0 (pattern=projection) with the following methods:
data_reducer (httomolib)
find_center_vo (httomolibgpu)
remove_outlier (httomolibgpu)
normalize (httomolibgpu)
0%| | 0/2 [00:00<?, ?block/s]
50%|##### | 1/2 [00:00<00:00, 1.29block/s]
--->The center of rotation is 79.5
Finished processing last block
Section 1 (pattern=sinogram) with the following methods:
remove_stripe_based_sorting (httomolibgpu)
FBP (httomolibgpu)
save_intermediate_data (httomo)
save_to_images (httomolib)
0%| | 0/1 [00:00<?, ?block/s]
Finished processing last block
Pipeline finished. Took 1.937s
This is due to how the max slices is calculated. The absolute maximum that the max slices can be (at the start of the function, before being potentially whittled down by the different methods in a section) is based on the chunk_shape of the data source:
The chunk_shape property on any implementor of DataSetSource does not include padding. Therefore, the absolute max slices that is started with is the length of the chunk shape's slicing dim unpadded. Therefore, even if the GPU could fit:
all slices
plus, the necessary padding slices
the determine_max_slices() method only can report that max slices is "all slices without padding slices".
In the context of the test data, the max slices calculated is 180 slices (all projections). But, execution of the first section needs 2 padding slices added, so there are 182 slices to process. Because the max slices is only 180 and not 182 (even though the GPU can fit 182 slices in memory), this forces the chunk to be split into 2 blocks.
In order to fix this, I think the determine_max_slices() logic needs to account for required padding slices, to handle the case when the max slice + padding slices could also fit into GPU memory.
The text was updated successfully, but these errors were encountered:
Firstly, it's worth pointing out that being able to fit an entire chunk into GPU memory is not that common, so this issue is not very relevant to processing big data - it was discovered when running on the small test data that easily fits into GPU memory.
On commit 930e0da in #446, running
tomo_standard.nxs
with thepipeline_gpu1.yaml
pipeline one can see that the small test data gets split into two blocks (whereas if padding is switched off forremove_outlier
in the associated methods database YAML file there is only one block):This is due to how the max slices is calculated. The absolute maximum that the max slices can be (at the start of the function, before being potentially whittled down by the different methods in a section) is based on the
chunk_shape
of the data source:httomo/httomo/runner/task_runner.py
Lines 317 to 318 in 930e0da
The
chunk_shape
property on any implementor ofDataSetSource
does not include padding. Therefore, the absolute max slices that is started with is the length of the chunk shape's slicing dim unpadded. Therefore, even if the GPU could fit:the
determine_max_slices()
method only can report that max slices is "all slices without padding slices".In the context of the test data, the max slices calculated is 180 slices (all projections). But, execution of the first section needs 2 padding slices added, so there are 182 slices to process. Because the max slices is only 180 and not 182 (even though the GPU can fit 182 slices in memory), this forces the chunk to be split into 2 blocks.
In order to fix this, I think the
determine_max_slices()
logic needs to account for required padding slices, to handle the case when the max slice + padding slices could also fit into GPU memory.The text was updated successfully, but these errors were encountered: