Volume too large #13

yuehuang2023 · 2024-10-15T03:42:56Z

Hi, I tried to reproduce the results of EMPIAR-10073 with GPU A6000 and set the parameters according to the supplementary. However, Relion reports errors. Any suggestions for solving this error? Thank you.

The run log is

Initializing the particle dataset
Assigning a diameter of 512 angstrom
Number of particles: 138899
Initialized data loaders for half sets of size 62505  and  62505
consensus updates are done every  0  epochs.
box size: 380 pixel_size: 1.400011 virtual pixel_size: 0.0026246719160104987  dimension of latent space:  10
Number of used gaussians: 30000
Optimizing scale only
volume too large: change size of output volumes. (If you want the original box size for the output volumes use a bigger gpu. The size of tensor a (380) must match the size of tensor b (190) at non-singleton dimension 2
Optimizing scale only
Initializing gaussian positions from reference
100%|##########| 50/50 [00:07<00:00,  6.29it/s]
Final error: 5.322801257534593e-07
Optimizing scale only
Initializing gaussian positions from reference
100%|##########| 50/50 [00:08<00:00,  6.12it/s]
Final error: 5.322801257534593e-07
consensus gaussian models initialized
consensus model  initialization finished
mean distance in graph for half 1: 2.4982950687408447 Angstrom ;This distance is also used to construct the initial graph 
mean distance in graph for half 2: 2.4982950687408447 Angstrom ;This distance is also used to construct the initial graph 
Computing half-set indices
100%|##########| 218/218 [00:14<00:00, 15.24it/s]
setting epoch type
generating graphs
100%|#########9| 217/218 [00:32<00:00,  6.77it/s]
Index tensor must have the same number of dimensions as self tensor

The run error is

/.conda/envs/relion-5.0/lib/python3.10/site-packages/dynamight/models/decoder.py:235: UserWarning: Using a target size (torch.Size([190, 190, 190])) that is different to the input size (torch.Size([380, 380, 380])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  loss = torch.nn.functional.mse_loss(

The text was updated successfully, but these errors were encountered:

huwjenkins · 2024-11-05T09:21:04Z

Yes box size of 360 px is hardcoded at multiple places in the code:

DynaMight/dynamight/models/decoder.py

Line 227 in eef4aa6

if reference_volume.shape[-1] > 360:

DynaMight/dynamight/models/decoder.py

Line 683 in eef4aa6

if reference_volume.shape[-1] > 360:

DynaMight/dynamight/models/decoder.py

Line 861 in eef4aa6

if self.box_size > 360:

DynaMight/dynamight/models/decoder.py

Line 904 in eef4aa6

if self.box_size > 360:

I couldn't find this mentioned in the Nature Methods paper and as @yuehuang2023 points out one of the example datasets used a box size of 380 px. @schwabjohannes, @scheres - why is 360 px hardcoded as a limit? The message:

If you want the original box size for the output volumes use a bigger gpu

seems a bit disingenuous when 360 px appears to be a hard-coded limit?

I also encountered the same message when running on one of my datasets with 384 px box.

scheres · 2024-11-05T09:35:12Z

Please, don't call something disingenuous so carelessly. A simple look at the code shows that 360px is not hardcoded as a limit. What is coded is an automated down-scaling in case the size goes above 360. This will be triggered with the example dataset. The error below should only be raised when an exception is encountered, supposedly when you run out of GPU memory. Perhaps you can try as suggested and run on a bigger GPU?

On 11/5/24 09:21, Huw Jenkins wrote: CAUTION: This email originated from outside of the LMB: ***@***.**** Do not click links or open attachments unless you recognize the sender and know the content is safe. If you think this is a phishing email, please forward it to ***@***.*** -- Yes box size of 360 px is hardcoded at multiple places in the code: https://github.com/3dem/DynaMight/blob/eef4aa673af6cc908042b38646ae489ee8f2fde9/dynamight/models/decoder.py#L227 https://github.com/3dem/DynaMight/blob/eef4aa673af6cc908042b38646ae489ee8f2fde9/dynamight/models/decoder.py#L683 https://github.com/3dem/DynaMight/blob/eef4aa673af6cc908042b38646ae489ee8f2fde9/dynamight/models/decoder.py#L861 https://github.com/3dem/DynaMight/blob/eef4aa673af6cc908042b38646ae489ee8f2fde9/dynamight/models/decoder.py#L904 I couldn't find this mentioned in the Nature Methods paper and as @yuehuang2023 <https://github.com/yuehuang2023> points out one of the example datasets used a box size of 380 px. @schwabjohannes <https://github.com/schwabjohannes>, @scheres <https://github.com/scheres> - why is 360 px hardcoded as a limit? The message: |If you want the original box size for the output volumes use a bigger gpu | seems a bit disingenuous when 360 px appears to be a hard-coded limit? I also encountered the same message when running on one of my datasets with 384 px box. — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFOHJCP37LSZ7YFPHOSEHU3Z7CEZPAVCNFSM6AAAAABP6F5RV6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJWGY2DMNZQGA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Sjors Scheres MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge Biomedical Campus Cambridge CB2 0QH, U.K. tel: +44 (0)1223 267061 http://www2.mrc-lmb.cam.ac.uk/groups/scheres

huwjenkins · 2024-11-05T09:56:34Z

Yes you are correct the message is triggered by running out of GPU memory. Sorry I should have looked more carefully. I was running on an A40 with 48 GB which I thought was quite a big GPU!

However, the volume will still be downscaled by 2 with a 384 px box. Should I crop the particles to 360 px?

yuehuang2023 · 2024-11-05T11:48:06Z

I used the GPU A6000 with the same configuration mentioned in the supplementary, but this error was still raised.

huwjenkins · 2024-11-06T11:05:26Z

I got DynaMight running on a H100 and with my dataset (384 px box) I got the same errors:

box size: 384 pixel_size: 0.825 virtual pixel_size: 0.0025974025974025974  dimension of latent space:  6
Number of used gaussians: 10000
Optimizing scale only
volume too large: change size of output volumes. (If you want the original box size for the output volumes use a bigger gpu. The size of tensor a (384) must match the size of tensor b (192) at non-singleton dimension 2

and

/xxx/miniforge/envs/relion-5.0/lib/python3.10/site-packages/dynamight/models/decoder.py:235: UserWarning: Using a target size (torch.Size([192, 192, 192])) that is different to the input size (torch.Size([384, 384, 384])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  loss = torch.nn.functional.mse_loss(

As I don't have access to a bigger gpu I made the following change:

--- decoder.py.orig	2024-11-06 09:02:03.000000000 +0000
+++ decoder.py	2024-11-06 09:02:26.000000000 +0000
@@ -224,7 +224,7 @@
         print('Optimizing scale only')
         optimizer = torch.optim.Adam(
             [self.image_smoother.A], lr=100*lr)
-        if reference_volume.shape[-1] > 360:
+        if reference_volume.shape[-1] > 384:
             reference_volume = torch.nn.functional.avg_pool3d(
                 reference_volume.unsqueeze(0).unsqueeze(0), 2)
             reference_volume = reference_volume.squeeze()

and the errors went away. I think my earlier apology was premature.

huwjenkins · 2024-11-06T12:16:20Z

The job with the modified dynamight/models/decoder.py is still running and is currently using ~21 GB of the 80 GB on the H100 GPU.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 PCIe               Off | 00000000:21:00.0 Off |                    0 |
| N/A   74C    P0             221W / 310W |  21301MiB / 81559MiB |     79%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

huwjenkins · 2024-11-06T17:12:35Z

So I believe the underlying bug is the failure to update self.vol_box around here:

--- dynamight/models/decoder.py.orig	2024-11-06 09:02:03.000000000 +0000
+++ dynamight/models/decoder.py	2024-11-06 16:35:26.000000000 +0000
@@ -228,6 +228,7 @@
             reference_volume = torch.nn.functional.avg_pool3d(
                 reference_volume.unsqueeze(0).unsqueeze(0), 2)
             reference_volume = reference_volume.squeeze()
+            self.vol_box//=2

         for i in range(n_epochs):
             optimizer.zero_grad()

which is then used in generate_consensus_volume() here:

DynaMight/dynamight/models/decoder.py

Lines 368 to 375 in eef4aa6

    
           def generate_consensus_volume(self): 
        
               scaling_fac = self.box_size/self.vol_box 
        
               self.batch_size = 2 
        
               p2v = PointsToVolumes(self.vol_box, self.n_classes, 
        
                                     self.grid_oversampling_factor) 
        
               amplitudes = torch.stack( 
        
                   2 * [self.amp*torch.nn.functional.softmax(self.ampvar, dim=0)], dim=0 
        
               )

However, I don't think that this is the most optimal way to deal with large boxes. If DynaMight has a cliff edge limit of 360 px then this should be documented and users advised to crop/downscale their particles appropriately. I could easily trim 12px from the edges of my particle boxes and other users with > 360 px boxes might also prefer to downsample to this size over automatic 2x downsampling?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Volume too large #13

Volume too large #13

yuehuang2023 commented Oct 15, 2024 •

edited

Loading

huwjenkins commented Nov 5, 2024

scheres commented Nov 5, 2024 via email

huwjenkins commented Nov 5, 2024

yuehuang2023 commented Nov 5, 2024

huwjenkins commented Nov 6, 2024

huwjenkins commented Nov 6, 2024

huwjenkins commented Nov 6, 2024

Volume too large #13

Volume too large #13

Comments

yuehuang2023 commented Oct 15, 2024 • edited Loading

huwjenkins commented Nov 5, 2024

scheres commented Nov 5, 2024 via email

huwjenkins commented Nov 5, 2024

yuehuang2023 commented Nov 5, 2024

huwjenkins commented Nov 6, 2024

huwjenkins commented Nov 6, 2024

huwjenkins commented Nov 6, 2024

yuehuang2023 commented Oct 15, 2024 •

edited

Loading