Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiling does not work on Windows11 #77

Open
mflamand opened this issue Nov 8, 2024 · 1 comment
Open

Tiling does not work on Windows11 #77

mflamand opened this issue Nov 8, 2024 · 1 comment
Assignees
Labels
win11 Windows 11 specific

Comments

@mflamand
Copy link

mflamand commented Nov 8, 2024

Hi,

First let me say that I have been quite happy with the performance of deconwolf. I have tested it using sets of RNA FISH images with great results. I think it works better (and faster) than the blind deconvolution algorithms I was using before. congrats!

I believe I may have found a bug. When using the latest release (0.4.3) in Windows 11, I am unable use the tiling option to process images that are too large for my GPU memory. For example, if I try to launch a run (mock run with 3 iterations, --verbose 2), I get the following :

dw --iter 3 --tilesize 1024 --prefix tiling --gpu --verbose 2 .\CamK2a_AAV15_06_CY3.tif .\PSF.tif
outFile: .\tiling_CamK2a_AAV15_06_CY3.tif, outFolder: .\
Settings:
image: .\CamK2a_AAV15_06_CY3.tif
psf: .\PSF.tif
output: .\tiling_CamK2a_AAV15_06_CY3.tif
log file: .\tiling_CamK2a_AAV15_06_CY3.tif.log.txt
nIter: 3
nThreads for FFT: 16
nThreads for OMP: 16
verbosity: 2
background level: auto
method: Scaled Heavy Ball + OpenCL (SHBCL2)
metric: Idiv
Stopping after 3 iterations
overwrite: NO
tiling, maxSize: 1024
tiling, padding: 20
XY crop factor: 0.001000
Offset: 5.000000
Output Format: 16 bit integer
Scaling: Automatic
Border Quality: 2 Minimal boundary artifacts
FFT lookahead: 0
FFTW3 plan: FFTW_MEASURE
Initial guess: Flat average

deconwolf: '0.4.3'

BUILD_DATE: 'Jun 22 2024'
TIFF Backend: 'LIBTIFF, Version 4.6.0
Copyright (c) 1988-1996 Sam Leffler
Copyright (c) 1991-1996 Silicon Graphics, Inc.'
OpenMP: YES
OpenCL: YES
VkFFT: YES
sizeof(int) = 4
sizeof(float) = 4
sizeof(double) = 8
sizeof(size_t) = 8

Image dimensions: 2048 x 2048 x 39

Reading .\PSF.tif
PSF Z-crop [181 x 181 x 265] -> [181 x 181 x 77]
PSF XY-crop [181 x 181 x 77] -> [161 x 161 x 77]
Output: .\tiling_CamK2a_AAV15_06_CY3.tif(.log.txt)
-> Divided the [2048 x 2048 x 39] image into 4 tiles
Initializing .\tiling_CamK2a_AAV15_06_CY3.tif.raw to 0
Dumping .\CamK2a_AAV15_06_CY3.tif to .\CamK2a_AAV15_06_CY3.tif.raw (for quicker io)

-> Processing tile 1 / 4
PSF X-crop: Not cropping
Deconvolving using shbcl2 (using inplace)
Setting the background level to 0.010000
image: [1044x1044x39], psf: [161x161x77], job: [1204x1204x115]
Found 2 CL platforms
Found 1 CL devices
Will use device 0 (first = 0)
CL device #0
CL_DEVICE_TYPE=CL_DEVICE_TYPE_GPU
CL_DEVICE_GLOBAL_MEM_SIZE = 17175150592 (17175 MiB)
CL_DEVICE_NAME = NVIDIA RTX 2000 Ada Generation
CL_DEVICE_VENDOR = NVIDIA Corporation
CL_DRIVER_VERSION = 553.24
CL_DEVICE_EXTENSIONS = cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_khr_gl_event cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_nv_kernel_attribute cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_win32 cl_khr_external_memory_win32
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS=4099
Using VkFFT version 10304
Preparing for convolutions of size 1204 x 1204 x 115
Warning: Will write the VkFFT configuration in the current folder.
Reason: Can not determine a suitable folder under Windows.
vkFFT cache file: VkFFT_kernelCache_1204x1204x115.binary
Initializing VkFFT for size 1204 x 1204 x 115
fimcl_fft_inplace
VkFFTAppend (for in-place forward transform)
.Creating weight map for boundary handling
fimcl_fft_inplace
VkFFTAppend (for in-place forward transform)
fimcl_convolve
fimcl_copy
fimcl_ifft_inplace
Downloading real data 1204 x 1204 x 115 (166705840 floats)
Start guess: FLAT
fimcl_copy
Iterating .fimcl_copy
fimcl_fft_inplace
VkFFTAppend (for in-place forward transform)
fimcl_convolve
fimcl_copy
fimcl_ifft_inplace
...fimcl_fft_inplace
VkFFTAppend (for in-place forward transform)
fimcl_convolve
fimcl_copy
fimcl_ifft_inplace
Iteration 1/ 3, Idiv=0.000e+00 .fimcl_copy
fimcl_fft_inplace
VkFFTAppend (for in-place forward transform)
fimcl_convolve
fimcl_copy
fimcl_ifft_inplace
...fimcl_fft_inplace
VkFFTAppend (for in-place forward transform)
fimcl_convolve
fimcl_copy
fimcl_ifft_inplace
Iteration 2/ 3, Idiv=0.000e+00 .fimcl_copy
fimcl_fft_inplace
VkFFTAppend (for in-place forward transform)
fimcl_convolve
fimcl_copy
fimcl_ifft_inplace
...fimcl_fft_inplace
VkFFTAppend (for in-place forward transform)
fimcl_convolve
fimcl_copy
fimcl_ifft_inplace
Iteration 3/ 3, Idiv=0.000e+00
Downloading real data 1204 x 1204 x 115 (166705840 floats)
Closing the OpenCL environment

The same is happening when processing using the CPU:

dw --iter 3 --tilesize 1024 --prefix tiling_cpu --verbose 2 .\CamK2a_AAV15_06_CY3.tif .\PSF.tif
outFile: .\tiling_cpu_CamK2a_AAV15_06_CY3.tif, outFolder: .\
Settings:
image: .\CamK2a_AAV15_06_CY3.tif
psf: .\PSF.tif
output: .\tiling_cpu_CamK2a_AAV15_06_CY3.tif
log file: .\tiling_cpu_CamK2a_AAV15_06_CY3.tif.log.txt
nIter: 3
nThreads for FFT: 16
nThreads for OMP: 16
verbosity: 2
background level: auto
method: Scaled Heavy Ball (SHB)
metric: Idiv
Stopping after 3 iterations
overwrite: NO
tiling, maxSize: 1024
tiling, padding: 20
XY crop factor: 0.001000
Offset: 5.000000
Output Format: 16 bit integer
Scaling: Automatic
Border Quality: 2 Minimal boundary artifacts
FFT lookahead: 0
FFTW3 plan: FFTW_MEASURE
Initial guess: Flat average

deconwolf: '0.4.3'
BUILD_DATE: 'Jun 22 2024'
TIFF Backend: 'LIBTIFF, Version 4.6.0
Copyright (c) 1988-1996 Sam Leffler
Copyright (c) 1991-1996 Silicon Graphics, Inc.'
OpenMP: YES
OpenCL: YES
VkFFT: YES
sizeof(int) = 4
sizeof(float) = 4
sizeof(double) = 8
sizeof(size_t) = 8

Image dimensions: 2048 x 2048 x 39
Reading .\PSF.tif
PSF Z-crop [181 x 181 x 265] -> [181 x 181 x 77]
PSF XY-crop [181 x 181 x 77] -> [161 x 161 x 77]
Output: .\tiling_cpu_CamK2a_AAV15_06_CY3.tif(.log.txt)
-> Divided the [2048 x 2048 x 39] image into 4 tiles
Initializing .\tiling_cpu_CamK2a_AAV15_06_CY3.tif.raw to 0
Dumping .\CamK2a_AAV15_06_CY3.tif to .\CamK2a_AAV15_06_CY3.tif.raw (for quicker io)

-> Processing tile 1 / 4
PSF X-crop: Not cropping
Deconvolving
Setting the background level to 0.010000
image: [1044x1044x39], psf: [161x161x77], job: [1204x1204x115]
Estimated peak memory usage: 5.8 GB
creating fftw3 plans ...
c2r plan ...
c2r inplace plan ...
r2c plan ...
r2c inplace plan ...
Exported fftw wisdom to fftw_wisdom_float_inplace_threads_16.dat
Iteration 3/ 3, Idiv=0.000e+00

It seems that the program always exits after the first tile is processed. The Idiv value stays at = 0.000e+00 (no background signal?). So my guess is that it fails to properly read in the image.

I get the same issue on 2 systems (#1: Intel 14900k, RTX 2000 Ada 16Gb, 64Gb Ram; #2: AMD 5900X, RTX 3080 10Gb, 64Gb RAM). I can use tiling with both systems whit Ubuntu 24.04 (in CPU or GPU modes), but not with Windows11. Tiling also works under WSL-Ubuntu and MacOS 15.1 (Apple M3 pro 18Gb) in CPU mode. GPU mode on MacOS does not work for me(it hangs at "fimcl_convolve"), but I wasn't looking to use GPU mode on my MacBook anyway.

By the way, related to issue #75, I am able to use the GPU mode under windows 11 with out any problem when the image is cropped.

I have no problem using dw under Ubuntu for now. For convenience (the workstation also runs windows exclusive software) it would be great if the issue could be fixed/looked at in the future. I am happy to do some testing if needed.

Best,
Mathieu

@elgw elgw added the win11 Windows 11 specific label Nov 11, 2024
@elgw elgw self-assigned this Nov 11, 2024
@elgw
Copy link
Owner

elgw commented Nov 11, 2024

Hi!

I'm glad that you find the software useful :)

Thank you for taking the time to report these issues and finding.

At the moment I can't say when I have time to look at the windows specific issues, but they won't be forgotten.

Unfortunately there is less chance that I will get deconwolf to run smoothly on MacOS in the nearest future (I have no access to hardware and OpenCL not the best backend). Possibly I'll revise that when/if deconwolf switches to/adds a Vulkan backend for the GPU computations.

Cheers,
Erik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
win11 Windows 11 specific
Projects
None yet
Development

No branches or pull requests

2 participants