Failed to reproduce the results #6

takerum · 2023-09-14T21:37:52Z

Dear authors,

I recently tried to replicate the results presented in the paper by rerunning the repository code myself. However, I encountered a discrepancy wherein the results I obtained did not match the numbers reflected in the table presented in your documentation.
Furthermore, I evaluated the released pre-trained model and noticed a decline in performance, particularly in the LPIPS and SSIM metrics, compared to the reported values. I have noted the numerical results that I obtained in the below table.

I would greatly appreciate any insights or suggestions you might have that could help me identify potential reasons for this discrepancy.

	LPIPS	SSIM	PSNR
The paper's numbers	0.262	0.839	21.38
The released model	0.316	0.807	21.17
The model trained with this repository	0.343	0.797	20.59

Regarding the settings and the code:

I used the same version of Pytorch and Torchvision and ran the code on 4-V100, which is the same GPU configuration noted in the paper.
I did not make any changes to the original code, except for the data processing section, due to an error that stopped the progress. More specifically, I observed that certain images were not conforming to the expected (360, 640) resolution and were not being reshaped into the appropriate shape because of not passing this line. To address this, I made the following modifications to ensure proper image reshaping and to adjust the intrinsic parameters accordingly:

H, W = rgb.shape[0], rgb.shape[1] # Some images are height less than 360

rgb = data_util.square_crop_img(rgb)
rgb = cv2.resize(rgb, (256, 256)) # all images are reshaped into (256, 256)

intrinsics = unnormalize_intrinsics(cam_params[timestep].intrinsics, 256, W*(256/H))
xscale = W / min(H, W)
yscale = H / min(H, W)

intrinsics[0, 2] = intrinsics[0, 2] / xscale
intrinsics[1, 2] = intrinsics[1, 2] / yscale

The text was updated successfully, but these errors were encountered:

yilundu · 2023-09-14T22:27:45Z

Hi,

Could you give a bit more detail on what dataset you are training on? Are you training on the full RealEstate10k dataset from Youtube? For the actual training -- can you also describe when you turn LPIPS and depth loss coefficients on? In general we found that training for a long time without those losses (maybe 3-5 days) and then tuning LPIPS and depth loss can lead to improvements in performance.

takerum · 2023-09-14T23:00:18Z

Sure, and thank you for the swift reply!

Are you training on the full RealEstate10k dataset from Youtube?

Yes, but the total number of examples I used is 66837, a bit less than the number in the paper (67477) because some of videos were no longer available on Youtube, but I do not believe this is the cause of the performance discrepancy.

can you also describe when you turn LPIPS and depth loss coefficients on? In general we found that training for a long time without those losses (maybe 3-5 days) and then tuning LPIPS and depth loss can lead to improvements in performance.

I followed the procedure outlined in the appendix of the paper. The exact steps I took are as follows:

I trained the model only with L1 loss for the first 30k iterations. The batchsize is set to 48.
Then I switched to the loss with the LPIPS and depth loss and continued training for an additional 70k iterations with batchsize 16.

But from your reply, I needed to train the model for more iterations only with L1 loss?

yilundu · 2023-09-14T23:11:39Z

Hi -- yes apologies about that -- I think some of the additional training details in the appendix may not be fully accurate -- you might get better performance with training without L1 loss for around 100k iterations and then train with the regularization loss later.

In terms of the numbers in paper -- they were the ones I got from a checkpoint before CVPR -- I ended up refactoring the entire codebase for the code release and the released pretrained weights are based off a model I trained with the refactored codebase. I believe if you train for the model for longer, you would likely be able to improve over the numbers of the provided pretrained model. You may also have to tune the LPIPS coefficient loss a bit, as well as the patch size in which it is applied (I did a lot of ad-hoc hacks to train the model at the CVPR time to try to improve the performance of the model).

yilundu · 2023-09-14T23:15:43Z

I would primarily worry about reproducing the PSNR / MSE results in the paper. You can improve the SSIM by decreasing the coefficient of the depth loss and you can improve the LPIPS by increasing the coefficient on the LPIPS loss.

takerum · 2023-09-14T23:22:05Z

No problem and thank you very much for the clarification and guidance! Ok, I will train the model for longer and see the results.

takerum · 2023-09-17T18:01:47Z

Dear @yilundu

Despite following your advice, I am yet to successfully replicate the results presented in your paper. I extended the training to 150,000 iterations, which led to a PSNR value of 21.5 — slightly surpassing the number you reported in your paper. However, finetuing with LPIPS and depth loss slightly improved LPIPS but deteriorated PSNR and SSIM. I put the table of each metric with the trained models below.

Regarding the coefficient; I decreased the coefficient for depth loss from 0.05 -> 0.01 and increased the coeff for LPIPS from 0.1 to 0.2 or 0.5. Both LPIPS coeffs produce similar scores.

Could there be any specifics in the implementation that might be impacting the results? Any guidance or suggestions you could provide would be greatly appreciated.

	LPIPS	SSIM	PSNR
The paper's numbers	0.262	0.839	21.38
trained for 150K iterations with L1 loss	0.296	0.824	21.58
fine-tuned for 50K iters with L1+LPIPS+depth loss	0.286	0.806	21.07

yilundu · 2023-09-18T00:38:03Z

Hi,

Sorry about the difficulty in reproducing the results -- the model I used at CVPR had a combinations of hacks (I kept on tuning different hyperparameters between different parts of training to improve the visual quality).

A couple things that might be helpful:
You can try enforcing the LPIPS in larger patch windows than the current code
You can try different types of depth loss -- I think I might have mixed the L1 and L2 loss in patch

Since a lot of this isn't so reproducible -- if you would like to compare with the paper I think it would be fine to use your current numbers as the PSNR roughly seems to match (which was the main metric I optimized for anyways besides visual quality)

takerum · 2023-09-18T21:24:22Z

Hi,

Thank you for the reply!

You can try enforcing the LPIPS in larger patch windows than the current code

Do you have any suggestions regarding the specific patch window sizes to test? As I understand it, the current size is 32x32. Would it be advisable to experiment with sizes such as 48x48 or 64x64?

yilundu · 2023-09-20T02:35:46Z

Hi,

Sorry about the late reply -- I've gotten a bit busy with the ICLR deadline.

Yes it could be interesting to try 48x48 and 64x64 or potentially 16x16.

Best,
Yilun

takerum closed this as completed Sep 14, 2023

takerum reopened this Sep 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to reproduce the results #6

Failed to reproduce the results #6

takerum commented Sep 14, 2023

yilundu commented Sep 14, 2023

takerum commented Sep 14, 2023 •

edited

Loading

yilundu commented Sep 14, 2023

yilundu commented Sep 14, 2023

takerum commented Sep 14, 2023

takerum commented Sep 17, 2023 •

edited

Loading

yilundu commented Sep 18, 2023

takerum commented Sep 18, 2023 •

edited

Loading

yilundu commented Sep 20, 2023

Failed to reproduce the results #6

Failed to reproduce the results #6

Comments

takerum commented Sep 14, 2023

Regarding the settings and the code:

yilundu commented Sep 14, 2023

takerum commented Sep 14, 2023 • edited Loading

yilundu commented Sep 14, 2023

yilundu commented Sep 14, 2023

takerum commented Sep 14, 2023

takerum commented Sep 17, 2023 • edited Loading

yilundu commented Sep 18, 2023

takerum commented Sep 18, 2023 • edited Loading

yilundu commented Sep 20, 2023

takerum commented Sep 14, 2023 •

edited

Loading

takerum commented Sep 17, 2023 •

edited

Loading

takerum commented Sep 18, 2023 •

edited

Loading