Running iterative alg stuck in Ubuntu system #552

stefenmax · 2024-05-29T16:07:25Z

Hi, I run your code smoothly on Windows, when I transfer to linux, after compile, it could run the forward and backprojection on my data. But every time When I run OSART-TV like below, it will stuck with no response. In windows it give me response within few seconds.
algs.ossart_tv(proj, self.geo, angles, niter=1, init = init)
Thanks for your help

Specifications

python version:3.10
OS:Linux
CUDA version:12.2
conda list

AnderBiguri · 2024-05-29T17:09:17Z

There are a couple of rare issues that may be causing this, but its been hard to debug because I can't reproduce it.

One thing to try: in the following function, a new geoemtry is created from the input one.

TIGRE/Python/tigre/algorithms/iterative_recon_alg.py

Line 217 in 2b18d9b

    
           geox.sVoxel[1:] = geox.sVoxel[1:] * 1.1  # a bit larger to avoid zeros in projections

Can you try changing the code locally so it doesn't do this modification of the geoemtry? Just the copy.

stefenmax · 2024-05-29T20:00:36Z

Do you mean comment this line right? I tried and failed. But I tried some Krylov subspace algorithms like CGLS and LSQR it worked, That is weired. But the OSART-TV's performence is the best...

AnderBiguri · 2024-05-29T20:35:26Z

@stefenmax not just that line, but the few after.
Apologies I am in a trip so can't help much, but the idea is to pass an un modified geo to Atb

stefenmax · 2024-05-29T21:04:41Z

Thanks for you help. But it still didn't works. Maybe I should run it using windows. And I found that the speed is faster than linux lol

AnderBiguri · 2024-05-29T21:12:29Z

hum... I don't really know then why.
As I can not reproduce I would need to know which function hangs, is there any way you can try to figure that out?
I have extensively used TIGRE in Linux, so its certainly a specific case of geometry, CUDA, number of GPUS, OS, python version or something like that that causes this strange error, but its hard for me to figure out simply because I don't see it.

I'll keep the issue open, if you do happen to pinpoint what exactly hangs (has to be some Ax() or Atb() call somewhere) do let me know. I do suspect its set_w or set_v that hang...

stefenmax · 2024-05-30T01:01:20Z

I found that I can run the ossart algogrithm in the example.py in my linux system. So I tried replace my geometry using the head phantom and found it hangg in the tigre.Ax. That is weired cause previously I could do the Ax and FDK for my own data. Here is the example code, I don't know if you can reproduce this.

from __future__ import division
from __future__ import print_function

import numpy as np
import tigre
import tigre.algorithms as algs
from tigre.utilities import sample_loader
from tigre.utilities.Measure_Quality import Measure_Quality
import tigre.utilities.gpu as gpu
import matplotlib.pyplot as plt
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
### This is just a basic example of very few TIGRE functionallity.
# We hihgly recomend checking the Demos folder, where most if not all features of tigre are demoed.

listGpuNames = gpu.getGpuNames()
if len(listGpuNames) == 0:
    print("Error: No gpu found")
else:
    for id in range(len(listGpuNames)):
        print("{}: {}".format(id, listGpuNames[id]))

gpuids = gpu.getGpuIds(listGpuNames[0])
print(gpuids)

# Geometry
# geo1 = tigre.geometry(mode='cone', high_resolution=False, default=True)
img_size = 256
geo = tigre.geometry(mode="cone")
geo.DSD = 950
geo.DSO = 540
geo.nDetector = np.array([1, 835]) 
geo.dDetector = np.array([1, 0.9643345*950 / 835])
geo.sDetector = geo.dDetector * geo.nDetector
geo.nVoxel = np.array([1, img_size, img_size])
geo.sVoxel = geo.nVoxel
geo.dVoxel = geo.sVoxel / geo.nVoxel 
geo.accuracy=0.5  
angles = np.linspace(0, np.pi/2, 180, dtype=np.float32)
# Prepare projection data
head = sample_loader.load_head_phantom(geo.nVoxel)
breakpoint()
proj = tigre.Ax(head, geo, angles, gpuids=gpuids)
test = tigre.Atb(proj,geo,angles,backprojection_type="matched",gpuids=gpuids)
# Reconstruct
niter = 20
fdkout = algs.fdk(proj, geo, angles, gpuids=gpuids)
breakpoint()
ossart = algs.ossart(proj, geo, angles, niter, blocksize=20, gpuids=gpuids)

# Measure Quality
# 'RMSE', 'MSSIM', 'SSD', 'UQI'
print("RMSE fdk:")
print(Measure_Quality(fdkout, head, ["nRMSE"]))
print("RMSE ossart")
print(Measure_Quality(ossart, head, ["nRMSE"]))

# Plot
fig, axes = plt.subplots(3, 2)
axes[0, 0].set_title("FDK")
axes[0, 0].imshow(fdkout[geo.nVoxel[0] // 2])
axes[1, 0].imshow(fdkout[:, geo.nVoxel[1] // 2, :])
axes[2, 0].imshow(fdkout[:, :, geo.nVoxel[2] // 2])
axes[0, 1].set_title("OS-SART")
axes[0, 1].imshow(ossart[geo.nVoxel[0] // 2])
axes[1, 1].imshow(ossart[:, geo.nVoxel[1] // 2, :])
axes[2, 1].imshow(ossart[:, :, geo.nVoxel[2] // 2])
plt.show()
# tigre.plotProj(proj)
# tigre.plotImg(fdkout)

AnderBiguri · 2024-05-30T01:16:41Z

So it hangs in the Ax in this code?
What if you make a different amount of GPUs visible? Are they all the same GPU?

stefenmax · 2024-05-30T02:16:27Z

yeah, it hangs in the Ax.
No it was not the same GPU. But in my another server, there are two same GPU. And it hangs in the same position.

AnderBiguri · 2024-05-30T02:34:50Z

Certainly with different GPUs behaviour is undefined, so that would be an issue.

I'll try your specific geometry. But out of curiosity, if you change the nvoxel/ndetector a bit, does it still hang?

stefenmax · 2024-05-30T02:42:16Z

Do you have any recommendation on how to change the nvoxel/ndetector?

AnderBiguri · 2024-05-30T02:54:39Z

Just give it a different value, just to see if its the specific values causing the issue.

stefenmax · 2024-05-30T20:24:45Z

Yes,after change it a bit. Still hang

AnderBiguri · 2024-06-10T11:41:33Z

Apologies, I don't seem to be able to reproduce this in any way. If you can pinpoint where the error is, do let me know.

timcogan · 2024-07-26T03:39:08Z

I have the same issue on Ubuntu. The code hangs here on my machine (I haven't stepped through the CUDA yet):

TIGRE/Python/tigre/utilities/cuda_interface/_Ax.pyx

Line 80 in b8e2e95

    
           cuda_raise_errors(siddon_ray_projection(c_img, c_geometry[0], c_projections, c_angles, total_projections, c_gpuids[0]))

If interpolation_projection is used instead of siddon_ray_projection, the rest of the code seems to run OK:

W = Ax(
    # np.ones(geox.nVoxel, dtype=np.float32), geox, self.angles, "Siddon", gpuids=self.gpuids
    np.ones(geox.nVoxel, dtype=np.float32), geox, self.angles, "interpolated", gpuids=self.gpuids
)

timcogan · 2024-07-26T13:14:32Z

This is where the code hangs inside Siddon_projection.cu:

TIGRE/Common/CUDA/Siddon_projection.cu

Line 519 in b8e2e95

cudaStreamSynchronize(stream[dev*2]);

AnderBiguri · 2024-07-26T13:21:43Z

Thanks @timcogan ! Its strange that means that some of the previous stuff gets into some infinite loop. Its hard to debug because its parallel code that I can't stop, but this information helps a lot actually.

AnderBiguri · 2024-07-26T13:22:52Z

What if you set the code to only use 1 GPU? does it still hang?

timcogan · 2024-07-26T14:58:51Z

Yes, it hangs when using only 1 GPU.

stefenmax closed this as completed May 29, 2024

AnderBiguri reopened this May 29, 2024

AnderBiguri mentioned this issue Jun 20, 2024

Get struck when change parallel to cone in single-GPU #560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running iterative alg stuck in Ubuntu system #552

Running iterative alg stuck in Ubuntu system #552

stefenmax commented May 29, 2024

AnderBiguri commented May 29, 2024

stefenmax commented May 29, 2024

AnderBiguri commented May 29, 2024

stefenmax commented May 29, 2024

AnderBiguri commented May 29, 2024 •

edited

Loading

stefenmax commented May 30, 2024

AnderBiguri commented May 30, 2024

stefenmax commented May 30, 2024

AnderBiguri commented May 30, 2024

stefenmax commented May 30, 2024

AnderBiguri commented May 30, 2024

stefenmax commented May 30, 2024

AnderBiguri commented Jun 10, 2024

timcogan commented Jul 26, 2024

timcogan commented Jul 26, 2024

AnderBiguri commented Jul 26, 2024

AnderBiguri commented Jul 26, 2024

timcogan commented Jul 26, 2024

Running iterative alg stuck in Ubuntu system #552

Running iterative alg stuck in Ubuntu system #552

Comments

stefenmax commented May 29, 2024

Specifications

AnderBiguri commented May 29, 2024

stefenmax commented May 29, 2024

AnderBiguri commented May 29, 2024

stefenmax commented May 29, 2024

AnderBiguri commented May 29, 2024 • edited Loading

stefenmax commented May 30, 2024

AnderBiguri commented May 30, 2024

stefenmax commented May 30, 2024

AnderBiguri commented May 30, 2024

stefenmax commented May 30, 2024

AnderBiguri commented May 30, 2024

stefenmax commented May 30, 2024

AnderBiguri commented Jun 10, 2024

timcogan commented Jul 26, 2024

timcogan commented Jul 26, 2024

AnderBiguri commented Jul 26, 2024

AnderBiguri commented Jul 26, 2024

timcogan commented Jul 26, 2024

AnderBiguri commented May 29, 2024 •

edited

Loading