You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to train a AutoEncoderKL model on RGB images with the following dimensions (3,1225,966). Here is the code that I use ( similar to what's there in tutorials/generative/2d_ldm/2d_ldm_tutorial.ipynb ).
autoencoderkl = AutoencoderKL(
spatial_dims=2,
in_channels=3,
out_channels=3,
num_channels=(128, 256, 384),
latent_channels=8,
num_res_blocks=1,
attention_levels=(False, False, False),
with_encoder_nonlocal_attn=False,
with_decoder_nonlocal_attn=False,
)
autoencoderkl = autoencoderkl.to(device)
Error is reported at line 27 (Train Model - as in the tutorials notebook) recons_loss = F.l1_loss(reconstruction.float(), images.float()) RuntimeError: The size of tensor a (964) must match the size of tensor b (966) at non-singleton dimension 3
Using pytorchinfo package , I was able to print the model summary and can find the discrepancy in the upsampling layer.
After some debugging I figured out a way to get around this problem. By resizing my images to standard 3:2 aspect ratio, (1024*720) I can see that the input and output shapes (obtained from pytorch.summary) of my AutoEncoderKL is consistent. But anyways, I would like to know the reason behind this error.
I think this happens cause you have downsamplings that divide the spatial dimensions by 2 and upsample, so unless you play around with the paddings and strides to make sure things end up having the same size, you might run into errors. I would recommend simply padding your inputs to a size that is consistently divisible by 2.
I am trying to train a AutoEncoderKL model on RGB images with the following dimensions (3,1225,966). Here is the code that I use ( similar to what's there in tutorials/generative/2d_ldm/2d_ldm_tutorial.ipynb ).
autoencoderkl = AutoencoderKL(
spatial_dims=2,
in_channels=3,
out_channels=3,
num_channels=(128, 256, 384),
latent_channels=8,
num_res_blocks=1,
attention_levels=(False, False, False),
with_encoder_nonlocal_attn=False,
with_decoder_nonlocal_attn=False,
)
autoencoderkl = autoencoderkl.to(device)
Error is reported at line 27 (Train Model - as in the tutorials notebook)
recons_loss = F.l1_loss(reconstruction.float(), images.float()) RuntimeError: The size of tensor a (964) must match the size of tensor b (966) at non-singleton dimension 3
Using pytorchinfo package , I was able to print the model summary and can find the discrepancy in the upsampling layer.
The text was updated successfully, but these errors were encountered: