FFT plan in both paganin filter methods sometimes has a negligible size #456

yousefmoazzam · 2024-09-24T11:49:06Z

This was discovered during memory allocation exploration for #454, and the changes made in b32a2c8 allowed the paganin memory hook tests to pass in the IRIS CI test jobs.

I don't know if this belongs in the httomo repo (because it's related to a method's memory estimator), or the httomolibgpu repo (because it's related to FFT's being performed in a specific method), or somewhere else (because I don't know the root cause of the issue). Nevertheless, I've put it here for now, just to have it documented somewhere.

Original observation

When running the memory hook tests inside a container on my local workstation and outside a container on my workstation, I saw a difference in the size of the FFT plan being allocated for the 2D FFT being performed in the paganin filter methods:

To be clear, I don't know if this is a container-related issue, or if it's simply that in both cases when the FFT plan size was negligible, it happened to be when running inside a container. For example, maybe version of the cupy python package, or the cufft CUDA package being different could cause this.

I did check the cupy version in the conda env inside the container and outside the container, both were v12.3.0, but inside the container the cupy package came from a conda channel whereas in the conda env outside the container the cupy package came from PyPI.

The way I was running the two paganin methods to see this behaviour

I chose one specific parametrisation of the memory hook tests for both the methods. In the examples in the section below, I was running the following memory hook test parametrisation for the savu paganin filter:

test_httomolibgpu.py::test_paganin_filter_savu_memoryhook[135-320-128]

Investigation details (ie, how I found that the FFT plan size was negligible)

Using the LineProfileHook in cupy, I was able to see the size of all allocations being done by the methods, and in particular, the FFT plan generated for the 2D FFT.

Outside a container on my workstation, the size of the FFT plan allocated was non-negligible for both methods. Here's truncated output for the savu paganin filter running:

/dls/science/users/twi18192/httomo/tests/test_backends/test_httomolibgpu.py:215:test_paganin_filter_savu_memoryhook (196.96MB, 175.14MB)
  /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/from_data.py:142:copy (21.09MB, 21.09MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:129:paganin_filter_savu (31.25MB, 31.00MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:669:pad (30.76MB, 30.76MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:71:_pad_simple (30.76MB, 30.76MB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (30.76MB, 30.76MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:700:pad (495.00KB, 247.50KB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:90:_set_pad_area (247.50KB, 247.50KB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:95:_set_pad_area (247.50KB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:163:paganin_filter_savu (61.52MB, 61.52MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/from_data.py:75:asarray (61.52MB, 61.52MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:164:paganin_filter_savu (61.52MB, 61.52MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:175:fft2 (61.52MB, 61.52MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:243:fftn (61.52MB, 61.52MB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:617:_fftn (61.52MB, 61.52MB)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:517:_exec_fftn (61.52MB, 61.52MB)
            /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:459:_get_cufft_plan_nd (61.52MB, 61.52MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:167:paganin_filter_savu (492.50KB, 0.00B)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (492.50KB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:197:paganin_filter_savu (21.09MB, 0.00B)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (21.09MB, 0.00B)

Inside a container on my workstation, the size of the FFT plan allocated was tiny/negligible for both methods. Here's truncated output for the savu paganin filter running inside the container:

/httomo/tests/test_backends/test_httomolibgpu.py:215:test_paganin_filter_savu_memoryhook (135.44MB, 113.62MB)
  /opt/conda/lib/python3.10/site-packages/cupy/_creation/from_data.py:142:copy (21.09MB, 21.09MB)
  /httomolibgpu/httomolibgpu/prep/phase.py:129:paganin_filter_savu (31.25MB, 31.00MB)
    /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:669:pad (30.76MB, 30.76MB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:71:_pad_simple (30.76MB, 30.76MB)
        /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (30.76MB, 30.76MB)
    /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:700:pad (495.00KB, 247.50KB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:90:_set_pad_area (247.50KB, 247.50KB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:95:_set_pad_area (247.50KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:163:paganin_filter_savu (61.52MB, 61.52MB)
    /opt/conda/lib/python3.10/site-packages/cupy/_creation/from_data.py:75:asarray (61.52MB, 61.52MB)
  /httomolibgpu/httomolibgpu/prep/phase.py:164:paganin_filter_savu (1.00KB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:175:fft2 (1.00KB, 0.00B)
      /opt/conda/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:243:fftn (1.00KB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:617:_fftn (1.00KB, 0.00B)
          /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:517:_exec_fftn (1.00KB, 0.00B)
            /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:459:_get_cufft_plan_nd (1.00KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:167:paganin_filter_savu (492.50KB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (492.50KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:197:paganin_filter_savu (21.09MB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (21.09MB, 0.00B)

Ok, but how is this relevant to the memory hook tests failing in IRIS CI?

Good question: my answer is that I'm not 100% sure if this is definitely happening in the paganin memory hook tests execution in the IRIS workflow. What I can say is that the Savu paganin memory hook tests were failing without correcting for a potential FFT plan with negligible size, but when I added the changes to correct for a potential negligible FFT plan (in b32a2c8), the tests passed.

I have not attempted to print out the LineProfileHook results when running the tests via the IRIS CI workflow, mainly due to not wanting to bother fiddling with the workflow file to get print() output displayed during running the tests. If we feel this should be investigated further, then this would probably be one of the steps to take.

What exactly is this "correction" for a potential FFT plan with negligible size?

Drawing graphs by hand that track the main memory allocations and deallocations was how I consolidated this conclusion. What it boils down to is that:

if the FFT plan size is non-negligible, then peak GPU memory usage occurs at the very last major allocation in the method
if the FFT plan size is negligible, then peak GPU memory usage occurs in the middle of the method (more precisely, right before the 2D FFT)

The peak GPU memory usage is what the MaxMemoryHook in httomo memory hook tests rely on for checking the max memory used, and thus if the FFT plan size being negligible or not affects the peak, then it can affect the result of memory hook tests.

The if/else branching added in b32a2c8 is encoding the logic to handle at what point in the method does peak GPU memory usage occurs.

The text was updated successfully, but these errors were encountered:

dkazanc · 2024-09-24T16:26:25Z

thanks @yousefmoazzam , interesting stuff. I wonder what happens with the Paganin filter of TomoPy? The same behaviour for memory estimation? I sense that the Savu's method is not the one that the majority of people will be using, the implementation of the method raises lots of questions.

We haven't discussed this in detail but I think that all memory hook tests will be moved to httomo-backends in the end? As all the libraries and supporting functions will be there.

yousefmoazzam · 2024-09-25T09:28:55Z

Yep, as the title suggests, the tomopy paganin filter has the same behaviour. I showed the savu paganin filter LineProfileHook results only purely because it was the last one I was fixing so I had the numbers at hand, and because the output for the tomopy paganin is basically the same information.

I can provide the LineProfileHook output for tomopy paganin too for completeness. This was for the following memory hook test:

test_httomolibgpu.py::test_paganin_filter_tomopy_memoryhook[340-320-128]

Here is truncated output for running outside a container (non-negligble FFT plan size):

/dls/science/users/twi18192/httomo/tests/test_backends/test_httomolibgpu.py:171:test_paganin_filter_tomopy_memoryhook (807.22MB, 693.54MB)
  /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/from_data.py:142:copy (53.12MB, 53.12MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:305:paganin_filter_tomopy (128.83MB, 128.42MB)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:385:_pad_projections_to_second_power (128.83MB, 128.42MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:669:pad (128.00MB, 128.00MB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:71:_pad_simple (128.00MB, 128.00MB)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (128.00MB, 128.00MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:700:pad (852.00KB, 426.00KB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:90:_set_pad_area (426.00KB, 426.00KB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:95:_set_pad_area (426.00KB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:310:paganin_filter_tomopy (256.00MB, 256.00MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/from_data.py:75:asarray (256.00MB, 256.00MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:311:paganin_filter_tomopy (256.00MB, 256.00MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:175:fft2 (256.00MB, 256.00MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:243:fftn (256.00MB, 256.00MB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:617:_fftn (256.00MB, 256.00MB)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:517:_exec_fftn (256.00MB, 256.00MB)
            /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:459:_get_cufft_plan_nd (256.00MB, 256.00MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:314:paganin_filter_tomopy (1.01MB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:231:_reciprocal_grid (2.00KB, 0.00B)
      /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:257:_reciprocal_coord (2.00KB, 0.00B)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/ranges.py:58:arange (2.00KB, 0.00B)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (2.00KB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:232:_reciprocal_grid (2.00KB, 0.00B)
      /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:257:_reciprocal_coord (2.00KB, 0.00B)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/ranges.py:58:arange (2.00KB, 0.00B)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (2.00KB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:233:_reciprocal_grid (2.00KB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:234:_reciprocal_grid (2.00KB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:236:_reciprocal_grid (1.00MB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:317:paganin_filter_tomopy (5.00MB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:392:_paganin_filter_factor2 (4.00MB, 0.00B)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:1082:fftshift (1.00MB, 0.00B)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_manipulation/rearrange.py:140:roll (1.00MB, 0.00B)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:88:empty_like (1.00MB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:318:paganin_filter_tomopy (1.00MB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:334:paganin_filter_tomopy (53.12MB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:344:paganin_filter_tomopy (53.12MB, 0.00B)

and then for running inside a container (negligble FFT plan size):

/httomo/tests/test_backends/test_httomolibgpu.py:171:test_paganin_filter_tomopy_memoryhook (551.22MB, 437.54MB)
  /opt/conda/lib/python3.10/site-packages/cupy/_creation/from_data.py:142:copy (53.12MB, 53.12MB)
  /httomolibgpu/httomolibgpu/prep/phase.py:305:paganin_filter_tomopy (128.83MB, 128.42MB)
    /httomolibgpu/httomolibgpu/prep/phase.py:385:_pad_projections_to_second_power (128.83MB, 128.42MB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:669:pad (128.00MB, 128.00MB)
        /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:71:_pad_simple (128.00MB, 128.00MB)
          /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (128.00MB, 128.00MB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:700:pad (852.00KB, 426.00KB)
        /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:90:_set_pad_area (426.00KB, 426.00KB)
        /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:95:_set_pad_area (426.00KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:310:paganin_filter_tomopy (256.00MB, 256.00MB)
    /opt/conda/lib/python3.10/site-packages/cupy/_creation/from_data.py:75:asarray (256.00MB, 256.00MB)
  /httomolibgpu/httomolibgpu/prep/phase.py:311:paganin_filter_tomopy (1.00KB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:175:fft2 (1.00KB, 0.00B)
      /opt/conda/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:243:fftn (1.00KB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:617:_fftn (1.00KB, 0.00B)
          /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:517:_exec_fftn (1.00KB, 0.00B)
            /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:459:_get_cufft_plan_nd (1.00KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:314:paganin_filter_tomopy (1.01MB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:231:_reciprocal_grid (2.00KB, 0.00B)
      /httomolibgpu/httomolibgpu/prep/phase.py:257:_reciprocal_coord (2.00KB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/_creation/ranges.py:58:arange (2.00KB, 0.00B)
          /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (2.00KB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:232:_reciprocal_grid (2.00KB, 0.00B)
      /httomolibgpu/httomolibgpu/prep/phase.py:257:_reciprocal_coord (2.00KB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/_creation/ranges.py:58:arange (2.00KB, 0.00B)
          /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (2.00KB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:233:_reciprocal_grid (2.00KB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:234:_reciprocal_grid (2.00KB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:236:_reciprocal_grid (1.00MB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:317:paganin_filter_tomopy (5.00MB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:392:_paganin_filter_factor2 (4.00MB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:1082:fftshift (1.00MB, 0.00B)
      /opt/conda/lib/python3.10/site-packages/cupy/_manipulation/rearrange.py:140:roll (1.00MB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:88:empty_like (1.00MB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:318:paganin_filter_tomopy (1.00MB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:334:paganin_filter_tomopy (53.12MB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:344:paganin_filter_tomopy (53.12MB, 0.00B

yousefmoazzam added question Further information is requested memory-estimation GPU memory estimator related labels Sep 24, 2024

yousefmoazzam changed the title ~~FFT plan in both paganin filter methods somtimes has a negligible size~~ FFT plan in both paganin filter methods sometimes has a negligible size Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFT plan in both paganin filter methods sometimes has a negligible size #456

FFT plan in both paganin filter methods sometimes has a negligible size #456

yousefmoazzam commented Sep 24, 2024 •

edited

Loading

dkazanc commented Sep 24, 2024

yousefmoazzam commented Sep 25, 2024 •

edited

Loading

FFT plan in both paganin filter methods sometimes has a negligible size #456

FFT plan in both paganin filter methods sometimes has a negligible size #456

Comments

yousefmoazzam commented Sep 24, 2024 • edited Loading

Original observation

The way I was running the two paganin methods to see this behaviour

Investigation details (ie, how I found that the FFT plan size was negligible)

Ok, but how is this relevant to the memory hook tests failing in IRIS CI?

What exactly is this "correction" for a potential FFT plan with negligible size?

dkazanc commented Sep 24, 2024

yousefmoazzam commented Sep 25, 2024 • edited Loading

yousefmoazzam commented Sep 24, 2024 •

edited

Loading

yousefmoazzam commented Sep 25, 2024 •

edited

Loading