RuntimeError: The size of tensor a (4096) must match the size of tensor b (500) at non-singleton dimension 2 #67

henanjun · 2022-07-12T03:55:23Z

I try to inference a new image with size (2048. 2048), it raises such a problem.

HugeBob · 2022-07-21T14:36:04Z

I am having the same error with images of size 1920x1080 but "The size of tensor a (1980) must match the size of tensor b (500) at non-singleton dimension 2"

shariqfarooq123 · 2022-10-25T11:14:12Z

This is because there are only 500 learned positional encodings and if you try to infer an image much higher than the default model resolution, then the number of tokens in the transformer would increase beyond 500 and you will get the error specified above.

Proposed resolutions:

(Recommended) Resize your image down to the model resolution (NYU: 640x480, KITTI: 1241x376) and upsample (e.g. bilinear interpolation) the result back to your resolution of choice.
Interpolate positional encodings to the required size.
Manually remove the positional encodings from the architecture and check the result. I have observed that positional encodings don't really add much to the performance.
If you have a custom high resolution depth dataset, fine-tune new larger number of positional encodings (>500, total = HW/256, where 256=16x16=patch_size x patch_size)

zydmtaichi · 2024-07-15T01:48:18Z

This is because there are only 500 learned positional encodings and if you try to infer an image much higher than the default model resolution, then the number of tokens in the transformer would increase beyond 500 and you will get the error specified above.

Proposed resolutions:

(Recommended) Resize your image down to the model resolution (NYU: 640x480, KITTI: 1241x376) and upsample (e.g. bilinear interpolation) the result back to your resolution of choice.

Interpolate positional encodings to the required size.

Manually remove the positional encodings from the architecture and check the result. I have observed that positional encodings don't really add much to the performance.

If you have a custom high resolution depth dataset, fine-tune new larger number of positional encodings (>500, total = HW/256, where 256=16x16=patch_size x patch_size)

hi @shariqfarooq123 :
could you please share more details about resolution 3rd? I want to keep the imgs resolution but still confused about how to remove positional encodings from the repo's infer program.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: The size of tensor a (4096) must match the size of tensor b (500) at non-singleton dimension 2 #67

RuntimeError: The size of tensor a (4096) must match the size of tensor b (500) at non-singleton dimension 2 #67

henanjun commented Jul 12, 2022

HugeBob commented Jul 21, 2022

shariqfarooq123 commented Oct 25, 2022 •

edited

Loading

zydmtaichi commented Jul 15, 2024

RuntimeError: The size of tensor a (4096) must match the size of tensor b (500) at non-singleton dimension 2 #67

RuntimeError: The size of tensor a (4096) must match the size of tensor b (500) at non-singleton dimension 2 #67

Comments

henanjun commented Jul 12, 2022

HugeBob commented Jul 21, 2022

shariqfarooq123 commented Oct 25, 2022 • edited Loading

zydmtaichi commented Jul 15, 2024

shariqfarooq123 commented Oct 25, 2022 •

edited

Loading