Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFT plan in both paganin filter methods sometimes has a negligible size #456

Open
yousefmoazzam opened this issue Sep 24, 2024 · 2 comments
Labels
memory-estimation GPU memory estimator related question Further information is requested

Comments

@yousefmoazzam
Copy link
Collaborator

yousefmoazzam commented Sep 24, 2024

This was discovered during memory allocation exploration for #454, and the changes made in b32a2c8 allowed the paganin memory hook tests to pass in the IRIS CI test jobs.

I don't know if this belongs in the httomo repo (because it's related to a method's memory estimator), or the httomolibgpu repo (because it's related to FFT's being performed in a specific method), or somewhere else (because I don't know the root cause of the issue). Nevertheless, I've put it here for now, just to have it documented somewhere.

Original observation

When running the memory hook tests inside a container on my local workstation and outside a container on my workstation, I saw a difference in the size of the FFT plan being allocated for the 2D FFT being performed in the paganin filter methods:

To be clear, I don't know if this is a container-related issue, or if it's simply that in both cases when the FFT plan size was negligible, it happened to be when running inside a container. For example, maybe version of the cupy python package, or the cufft CUDA package being different could cause this.

I did check the cupy version in the conda env inside the container and outside the container, both were v12.3.0, but inside the container the cupy package came from a conda channel whereas in the conda env outside the container the cupy package came from PyPI.

The way I was running the two paganin methods to see this behaviour

I chose one specific parametrisation of the memory hook tests for both the methods. In the examples in the section below, I was running the following memory hook test parametrisation for the savu paganin filter:

test_httomolibgpu.py::test_paganin_filter_savu_memoryhook[135-320-128]

Investigation details (ie, how I found that the FFT plan size was negligible)

Using the LineProfileHook in cupy, I was able to see the size of all allocations being done by the methods, and in particular, the FFT plan generated for the 2D FFT.

Outside a container on my workstation, the size of the FFT plan allocated was non-negligible for both methods. Here's truncated output for the savu paganin filter running:

/dls/science/users/twi18192/httomo/tests/test_backends/test_httomolibgpu.py:215:test_paganin_filter_savu_memoryhook (196.96MB, 175.14MB)
  /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/from_data.py:142:copy (21.09MB, 21.09MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:129:paganin_filter_savu (31.25MB, 31.00MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:669:pad (30.76MB, 30.76MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:71:_pad_simple (30.76MB, 30.76MB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (30.76MB, 30.76MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:700:pad (495.00KB, 247.50KB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:90:_set_pad_area (247.50KB, 247.50KB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:95:_set_pad_area (247.50KB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:163:paganin_filter_savu (61.52MB, 61.52MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/from_data.py:75:asarray (61.52MB, 61.52MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:164:paganin_filter_savu (61.52MB, 61.52MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:175:fft2 (61.52MB, 61.52MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:243:fftn (61.52MB, 61.52MB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:617:_fftn (61.52MB, 61.52MB)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:517:_exec_fftn (61.52MB, 61.52MB)
            /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:459:_get_cufft_plan_nd (61.52MB, 61.52MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:167:paganin_filter_savu (492.50KB, 0.00B)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (492.50KB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:197:paganin_filter_savu (21.09MB, 0.00B)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (21.09MB, 0.00B)

Inside a container on my workstation, the size of the FFT plan allocated was tiny/negligible for both methods. Here's truncated output for the savu paganin filter running inside the container:

/httomo/tests/test_backends/test_httomolibgpu.py:215:test_paganin_filter_savu_memoryhook (135.44MB, 113.62MB)
  /opt/conda/lib/python3.10/site-packages/cupy/_creation/from_data.py:142:copy (21.09MB, 21.09MB)
  /httomolibgpu/httomolibgpu/prep/phase.py:129:paganin_filter_savu (31.25MB, 31.00MB)
    /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:669:pad (30.76MB, 30.76MB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:71:_pad_simple (30.76MB, 30.76MB)
        /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (30.76MB, 30.76MB)
    /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:700:pad (495.00KB, 247.50KB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:90:_set_pad_area (247.50KB, 247.50KB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:95:_set_pad_area (247.50KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:163:paganin_filter_savu (61.52MB, 61.52MB)
    /opt/conda/lib/python3.10/site-packages/cupy/_creation/from_data.py:75:asarray (61.52MB, 61.52MB)
  /httomolibgpu/httomolibgpu/prep/phase.py:164:paganin_filter_savu (1.00KB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:175:fft2 (1.00KB, 0.00B)
      /opt/conda/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:243:fftn (1.00KB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:617:_fftn (1.00KB, 0.00B)
          /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:517:_exec_fftn (1.00KB, 0.00B)
            /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:459:_get_cufft_plan_nd (1.00KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:167:paganin_filter_savu (492.50KB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (492.50KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:197:paganin_filter_savu (21.09MB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (21.09MB, 0.00B)

Ok, but how is this relevant to the memory hook tests failing in IRIS CI?

Good question: my answer is that I'm not 100% sure if this is definitely happening in the paganin memory hook tests execution in the IRIS workflow. What I can say is that the Savu paganin memory hook tests were failing without correcting for a potential FFT plan with negligible size, but when I added the changes to correct for a potential negligible FFT plan (in b32a2c8), the tests passed.

I have not attempted to print out the LineProfileHook results when running the tests via the IRIS CI workflow, mainly due to not wanting to bother fiddling with the workflow file to get print() output displayed during running the tests. If we feel this should be investigated further, then this would probably be one of the steps to take.

What exactly is this "correction" for a potential FFT plan with negligible size?

Drawing graphs by hand that track the main memory allocations and deallocations was how I consolidated this conclusion. What it boils down to is that:

  • if the FFT plan size is non-negligible, then peak GPU memory usage occurs at the very last major allocation in the method
  • if the FFT plan size is negligible, then peak GPU memory usage occurs in the middle of the method (more precisely, right before the 2D FFT)

The peak GPU memory usage is what the MaxMemoryHook in httomo memory hook tests rely on for checking the max memory used, and thus if the FFT plan size being negligible or not affects the peak, then it can affect the result of memory hook tests.

The if/else branching added in b32a2c8 is encoding the logic to handle at what point in the method does peak GPU memory usage occurs.

@yousefmoazzam yousefmoazzam added question Further information is requested memory-estimation GPU memory estimator related labels Sep 24, 2024
@yousefmoazzam yousefmoazzam changed the title FFT plan in both paganin filter methods somtimes has a negligible size FFT plan in both paganin filter methods sometimes has a negligible size Sep 24, 2024
@dkazanc
Copy link
Collaborator

dkazanc commented Sep 24, 2024

thanks @yousefmoazzam , interesting stuff. I wonder what happens with the Paganin filter of TomoPy? The same behaviour for memory estimation? I sense that the Savu's method is not the one that the majority of people will be using, the implementation of the method raises lots of questions.

We haven't discussed this in detail but I think that all memory hook tests will be moved to httomo-backends in the end? As all the libraries and supporting functions will be there.

@yousefmoazzam
Copy link
Collaborator Author

yousefmoazzam commented Sep 25, 2024

Yep, as the title suggests, the tomopy paganin filter has the same behaviour. I showed the savu paganin filter LineProfileHook results only purely because it was the last one I was fixing so I had the numbers at hand, and because the output for the tomopy paganin is basically the same information.

I can provide the LineProfileHook output for tomopy paganin too for completeness. This was for the following memory hook test:

test_httomolibgpu.py::test_paganin_filter_tomopy_memoryhook[340-320-128]

Here is truncated output for running outside a container (non-negligble FFT plan size):

/dls/science/users/twi18192/httomo/tests/test_backends/test_httomolibgpu.py:171:test_paganin_filter_tomopy_memoryhook (807.22MB, 693.54MB)
  /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/from_data.py:142:copy (53.12MB, 53.12MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:305:paganin_filter_tomopy (128.83MB, 128.42MB)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:385:_pad_projections_to_second_power (128.83MB, 128.42MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:669:pad (128.00MB, 128.00MB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:71:_pad_simple (128.00MB, 128.00MB)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (128.00MB, 128.00MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:700:pad (852.00KB, 426.00KB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:90:_set_pad_area (426.00KB, 426.00KB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_padding/pad.py:95:_set_pad_area (426.00KB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:310:paganin_filter_tomopy (256.00MB, 256.00MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/from_data.py:75:asarray (256.00MB, 256.00MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:311:paganin_filter_tomopy (256.00MB, 256.00MB)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:175:fft2 (256.00MB, 256.00MB)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:243:fftn (256.00MB, 256.00MB)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:617:_fftn (256.00MB, 256.00MB)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:517:_exec_fftn (256.00MB, 256.00MB)
            /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:459:_get_cufft_plan_nd (256.00MB, 256.00MB)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:314:paganin_filter_tomopy (1.01MB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:231:_reciprocal_grid (2.00KB, 0.00B)
      /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:257:_reciprocal_coord (2.00KB, 0.00B)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/ranges.py:58:arange (2.00KB, 0.00B)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (2.00KB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:232:_reciprocal_grid (2.00KB, 0.00B)
      /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:257:_reciprocal_coord (2.00KB, 0.00B)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/ranges.py:58:arange (2.00KB, 0.00B)
          /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (2.00KB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:233:_reciprocal_grid (2.00KB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:234:_reciprocal_grid (2.00KB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:236:_reciprocal_grid (1.00MB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:317:paganin_filter_tomopy (5.00MB, 0.00B)
    /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:392:_paganin_filter_factor2 (4.00MB, 0.00B)
    /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/fft/_fft.py:1082:fftshift (1.00MB, 0.00B)
      /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_manipulation/rearrange.py:140:roll (1.00MB, 0.00B)
        /dls/science/users/twi18192/conda-envs/httomo/lib/python3.10/site-packages/cupy/_creation/basic.py:88:empty_like (1.00MB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:318:paganin_filter_tomopy (1.00MB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:334:paganin_filter_tomopy (53.12MB, 0.00B)
  /dls/science/users/twi18192/httomolibgpu/httomolibgpu/prep/phase.py:344:paganin_filter_tomopy (53.12MB, 0.00B)

and then for running inside a container (negligble FFT plan size):

/httomo/tests/test_backends/test_httomolibgpu.py:171:test_paganin_filter_tomopy_memoryhook (551.22MB, 437.54MB)
  /opt/conda/lib/python3.10/site-packages/cupy/_creation/from_data.py:142:copy (53.12MB, 53.12MB)
  /httomolibgpu/httomolibgpu/prep/phase.py:305:paganin_filter_tomopy (128.83MB, 128.42MB)
    /httomolibgpu/httomolibgpu/prep/phase.py:385:_pad_projections_to_second_power (128.83MB, 128.42MB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:669:pad (128.00MB, 128.00MB)
        /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:71:_pad_simple (128.00MB, 128.00MB)
          /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (128.00MB, 128.00MB)
      /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:700:pad (852.00KB, 426.00KB)
        /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:90:_set_pad_area (426.00KB, 426.00KB)
        /opt/conda/lib/python3.10/site-packages/cupy/_padding/pad.py:95:_set_pad_area (426.00KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:310:paganin_filter_tomopy (256.00MB, 256.00MB)
    /opt/conda/lib/python3.10/site-packages/cupy/_creation/from_data.py:75:asarray (256.00MB, 256.00MB)
  /httomolibgpu/httomolibgpu/prep/phase.py:311:paganin_filter_tomopy (1.00KB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:175:fft2 (1.00KB, 0.00B)
      /opt/conda/lib/python3.10/site-packages/cupyx/scipy/fft/_fft.py:243:fftn (1.00KB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:617:_fftn (1.00KB, 0.00B)
          /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:517:_exec_fftn (1.00KB, 0.00B)
            /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:459:_get_cufft_plan_nd (1.00KB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:314:paganin_filter_tomopy (1.01MB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:231:_reciprocal_grid (2.00KB, 0.00B)
      /httomolibgpu/httomolibgpu/prep/phase.py:257:_reciprocal_coord (2.00KB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/_creation/ranges.py:58:arange (2.00KB, 0.00B)
          /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (2.00KB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:232:_reciprocal_grid (2.00KB, 0.00B)
      /httomolibgpu/httomolibgpu/prep/phase.py:257:_reciprocal_coord (2.00KB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/_creation/ranges.py:58:arange (2.00KB, 0.00B)
          /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:22:empty (2.00KB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:233:_reciprocal_grid (2.00KB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:234:_reciprocal_grid (2.00KB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:236:_reciprocal_grid (1.00MB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:317:paganin_filter_tomopy (5.00MB, 0.00B)
    /httomolibgpu/httomolibgpu/prep/phase.py:392:_paganin_filter_factor2 (4.00MB, 0.00B)
    /opt/conda/lib/python3.10/site-packages/cupy/fft/_fft.py:1082:fftshift (1.00MB, 0.00B)
      /opt/conda/lib/python3.10/site-packages/cupy/_manipulation/rearrange.py:140:roll (1.00MB, 0.00B)
        /opt/conda/lib/python3.10/site-packages/cupy/_creation/basic.py:88:empty_like (1.00MB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:318:paganin_filter_tomopy (1.00MB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:334:paganin_filter_tomopy (53.12MB, 0.00B)
  /httomolibgpu/httomolibgpu/prep/phase.py:344:paganin_filter_tomopy (53.12MB, 0.00B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory-estimation GPU memory estimator related question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants