Entire chunk that can fit in GPU memory will be split into two blocks if padding method present #453

yousefmoazzam · 2024-09-19T09:56:54Z

Firstly, it's worth pointing out that being able to fit an entire chunk into GPU memory is not that common, so this issue is not very relevant to processing big data - it was discovered when running on the small test data that easily fits into GPU memory.

On commit 930e0da in #446, running tomo_standard.nxs with the pipeline_gpu1.yaml pipeline one can see that the small test data gets split into two blocks (whereas if padding is switched off for remove_outlier in the associated methods database YAML file there is only one block):

(base) root@492a2a3538d1:/httomo# python -m httomo run tests/test_data/tomo_standard.nxs tests/samples/pipeline_template_examples/pipeline_gpu1.yaml output_dir/
Pipeline has been separated into 2 sections
See the full log file at: output_dir/19-09-2024_09_44_59_output/user.log
Running loader (pattern=projection): standard_tomo...
    Finished loader: standard_tomo (httomo) Took 37.68ms
Section 0 (pattern=projection) with the following methods:
    data_reducer (httomolib)
    find_center_vo (httomolibgpu)
    remove_outlier (httomolibgpu)
    normalize (httomolibgpu)
     0%|          | 0/2 [00:00<?, ?block/s]
    50%|#####     | 1/2 [00:00<00:00,  1.29block/s]
    --->The center of rotation is 79.5
    Finished processing last block
Section 1 (pattern=sinogram) with the following methods:
    remove_stripe_based_sorting (httomolibgpu)
    FBP (httomolibgpu)
    save_intermediate_data (httomo)
    save_to_images (httomolib)
     0%|          | 0/1 [00:00<?, ?block/s]
    Finished processing last block
Pipeline finished. Took 1.937s

This is due to how the max slices is calculated. The absolute maximum that the max slices can be (at the start of the function, before being potentially whittled down by the different methods in a section) is based on the chunk_shape of the data source:

httomo/httomo/runner/task_runner.py

Lines 317 to 318 in 930e0da

    
           data_shape = self.source.chunk_shape 
        
           max_slices = data_shape[slicing_dim]

The chunk_shape property on any implementor of DataSetSource does not include padding. Therefore, the absolute max slices that is started with is the length of the chunk shape's slicing dim unpadded. Therefore, even if the GPU could fit:

all slices
plus, the necessary padding slices

the determine_max_slices() method only can report that max slices is "all slices without padding slices".

In the context of the test data, the max slices calculated is 180 slices (all projections). But, execution of the first section needs 2 padding slices added, so there are 182 slices to process. Because the max slices is only 180 and not 182 (even though the GPU can fit 182 slices in memory), this forces the chunk to be split into 2 blocks.

In order to fix this, I think the determine_max_slices() logic needs to account for required padding slices, to handle the case when the max slice + padding slices could also fit into GPU memory.

The text was updated successfully, but these errors were encountered:

yousefmoazzam added bug Something isn't working framework Data-handling framework related labels Sep 19, 2024

dkazanc mentioned this issue Sep 19, 2024

Integrate padding into runner #446

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entire chunk that can fit in GPU memory will be split into two blocks if padding method present #453

Entire chunk that can fit in GPU memory will be split into two blocks if padding method present #453

yousefmoazzam commented Sep 19, 2024

Entire chunk that can fit in GPU memory will be split into two blocks if padding method present #453

Entire chunk that can fit in GPU memory will be split into two blocks if padding method present #453

Comments

yousefmoazzam commented Sep 19, 2024