Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blush uses CPU in Refine3D #1223

Open
dzyla opened this issue Dec 21, 2024 · 2 comments
Open

Blush uses CPU in Refine3D #1223

dzyla opened this issue Dec 21, 2024 · 2 comments

Comments

@dzyla
Copy link

dzyla commented Dec 21, 2024

Describe your problem

Please write a clear description of what the problem is.

Running 3D refinement in relion --tomo using this command:

which relion_refine_mpi` --continue Refine3D/job006/run_it007_optimiser.star --o Refine3D/job006/run --blush --dont_combine_weights_via_disc --pool 3 --pad 2 --particle_diameter 240 --solvent_mask MaskCreate/job004/mask.mrc --solvent_correct_fsc --j 6 --gpu "" --pipeline_control Refine3D/job006/

Results in a very long Blush runs (from 1.5 to 8h) as shown here:

2024-12-20 15:53:27.907 | INFO     | relion_blush.command_line:main:271 - ARGUMENTS: Namespace(star_file='Refine3D/job006/run_it008_half1_class001_external_reconstruct.star', model_name='v1.0', strides=20, batch_size=1, gpu=',', device_timeout=1200, debug=False, skip_spectral_trailing=False)
2024-12-20 15:53:28.100 | INFO     | relion_blush.command_line:main:283 - Loading model time 0.19 s
2024-12-20 17:26:10.000 | INFO     | relion_blush.command_line:main:315 - Selected device: cpu
2024-12-20 17:26:10.887 | INFO     | relion_blush.command_line:refine3d:44 - Resample time 0.89 s
2024-12-20 17:26:11.257 | INFO     | relion_blush.command_line:refine3d:57 - Volume preprocess rescale time 0.37 s
2024-12-20 17:26:11.369 | INFO     | relion_blush.command_line:refine3d:74 - Radial masks time 0.11 s
2024-12-20 17:28:14.290 | INFO     | relion_blush.command_line:refine3d:87 - Running model time 122.92 s
2024-12-20 17:28:14.611 | INFO     | relion_blush.command_line:refine3d:97 - Post-processing rescale time 0.32 s
2024-12-20 17:28:14.611 | INFO     | relion_blush.command_line:refine3d:131 - Applying spectral trailing
2024-12-20 17:28:14.622 | INFO     | relion_blush.command_line:refine3d:135 - Max denoised spectral index: 39
2024-12-20 17:28:14.622 | INFO     | relion_blush.command_line:refine3d:138 - Max denoised resolution: 9.85
2024-12-20 17:28:14.686 | INFO     | relion_blush.command_line:refine3d:155 - Ouput to file Refine3D/job006/run_it008_half1_class001_external_reconstruct.mrc

The identified issue might be the 2024-12-20 17:26:10.000 | INFO | relion_blush.command_line:main:315 - Selected device: cpu. However, when the Python environement is loaded, and this is executed:

import torch

# Check if GPU is available
if torch.cuda.is_available():
    print("GPU is available!")
    device = torch.device("cuda") 
else:
    print("GPU is not available, using CPU")
    device = torch.device("cpu") 

# Print the current device
print(f"Using device: {device}")

# Get information about the GPU (if available)
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}") 
    print(f"Number of GPUs: {torch.cuda.device_count()}")

It correctly shows the number of GPUs detected, and Torch supports CUDA. Is there any easy solution to make Blush run faster?

Environment:

  • OS: Rocky Linux release 8.6 (Green Obsidian)
  • MPI runtime: mpirun (Open MPI) 4.1.1
  • RELION version 5.0.0
  • Memory: Cluster / 128/256
  • GPU: 2080ti/A40
@biochem-fan
Copy link
Member

@dkimanius Can you look at this problem?
The problem seems to be in the --gpu argument to Blush.
Currently it is ",", which doesn't make sense.

relion/src/ml_optimiser.cpp

Lines 2034 to 2040 in 4e57a4d

if (do_gpu)
{
blush_args += " --gpu ";
for (auto &d : gpuDevices)
blush_args += gpu_ids + ",";
blush_args += " ";
}

Perhaps you meant blush_args += d + ",";?

@dzyla Meanwhile, probably you can workaround this issue by explicitly specifying GPU IDs in the RELION GUI. For example, if you have 2 GPUs and 5 MPI processes, specify "0:1:0:1".

@ReikaWatanabe
Copy link

@biochem-fan,
Thank you very much for getting back to us so quickly. Specifying 0:1:2:3 (4 GPU), now maximization steps are done in 18 sec! Thank you very much for getting back to me so quickly. It was a great suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants