Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to reproduce the results #6

Open
takerum opened this issue Sep 14, 2023 · 9 comments
Open

Failed to reproduce the results #6

takerum opened this issue Sep 14, 2023 · 9 comments

Comments

@takerum
Copy link

takerum commented Sep 14, 2023

Dear authors,

I recently tried to replicate the results presented in the paper by rerunning the repository code myself. However, I encountered a discrepancy wherein the results I obtained did not match the numbers reflected in the table presented in your documentation.
Furthermore, I evaluated the released pre-trained model and noticed a decline in performance, particularly in the LPIPS and SSIM metrics, compared to the reported values. I have noted the numerical results that I obtained in the below table.

I would greatly appreciate any insights or suggestions you might have that could help me identify potential reasons for this discrepancy.

LPIPS SSIM PSNR
The paper's numbers 0.262 0.839 21.38
The released model 0.316 0.807 21.17
The model trained with this repository 0.343 0.797 20.59

Regarding the settings and the code:

I used the same version of Pytorch and Torchvision and ran the code on 4-V100, which is the same GPU configuration noted in the paper.
I did not make any changes to the original code, except for the data processing section, due to an error that stopped the progress. More specifically, I observed that certain images were not conforming to the expected (360, 640) resolution and were not being reshaped into the appropriate shape because of not passing this line. To address this, I made the following modifications to ensure proper image reshaping and to adjust the intrinsic parameters accordingly:

H, W = rgb.shape[0], rgb.shape[1] # Some images are height less than 360

rgb = data_util.square_crop_img(rgb)
rgb = cv2.resize(rgb, (256, 256)) # all images are reshaped into (256, 256)

intrinsics = unnormalize_intrinsics(cam_params[timestep].intrinsics, 256, W*(256/H))
xscale = W / min(H, W)
yscale = H / min(H, W)

intrinsics[0, 2] = intrinsics[0, 2] / xscale
intrinsics[1, 2] = intrinsics[1, 2] / yscale
@yilundu
Copy link
Owner

yilundu commented Sep 14, 2023

Hi,

Could you give a bit more detail on what dataset you are training on? Are you training on the full RealEstate10k dataset from Youtube? For the actual training -- can you also describe when you turn LPIPS and depth loss coefficients on? In general we found that training for a long time without those losses (maybe 3-5 days) and then tuning LPIPS and depth loss can lead to improvements in performance.

@takerum
Copy link
Author

takerum commented Sep 14, 2023

Sure, and thank you for the swift reply!

Are you training on the full RealEstate10k dataset from Youtube?

Yes, but the total number of examples I used is 66837, a bit less than the number in the paper (67477) because some of videos were no longer available on Youtube, but I do not believe this is the cause of the performance discrepancy.

can you also describe when you turn LPIPS and depth loss coefficients on? In general we found that training for a long time without those losses (maybe 3-5 days) and then tuning LPIPS and depth loss can lead to improvements in performance.

I followed the procedure outlined in the appendix of the paper. The exact steps I took are as follows:

  1. I trained the model only with L1 loss for the first 30k iterations. The batchsize is set to 48.
  2. Then I switched to the loss with the LPIPS and depth loss and continued training for an additional 70k iterations with batchsize 16.

But from your reply, I needed to train the model for more iterations only with L1 loss?

image

@yilundu
Copy link
Owner

yilundu commented Sep 14, 2023

Hi -- yes apologies about that -- I think some of the additional training details in the appendix may not be fully accurate -- you might get better performance with training without L1 loss for around 100k iterations and then train with the regularization loss later.

In terms of the numbers in paper -- they were the ones I got from a checkpoint before CVPR -- I ended up refactoring the entire codebase for the code release and the released pretrained weights are based off a model I trained with the refactored codebase. I believe if you train for the model for longer, you would likely be able to improve over the numbers of the provided pretrained model. You may also have to tune the LPIPS coefficient loss a bit, as well as the patch size in which it is applied (I did a lot of ad-hoc hacks to train the model at the CVPR time to try to improve the performance of the model).

@yilundu
Copy link
Owner

yilundu commented Sep 14, 2023

I would primarily worry about reproducing the PSNR / MSE results in the paper. You can improve the SSIM by decreasing the coefficient of the depth loss and you can improve the LPIPS by increasing the coefficient on the LPIPS loss.

@takerum
Copy link
Author

takerum commented Sep 14, 2023

No problem and thank you very much for the clarification and guidance! Ok, I will train the model for longer and see the results.

@takerum takerum closed this as completed Sep 14, 2023
@takerum takerum reopened this Sep 17, 2023
@takerum
Copy link
Author

takerum commented Sep 17, 2023

Dear @yilundu

Despite following your advice, I am yet to successfully replicate the results presented in your paper. I extended the training to 150,000 iterations, which led to a PSNR value of 21.5 — slightly surpassing the number you reported in your paper. However, finetuing with LPIPS and depth loss slightly improved LPIPS but deteriorated PSNR and SSIM. I put the table of each metric with the trained models below.

Regarding the coefficient; I decreased the coefficient for depth loss from 0.05 -> 0.01 and increased the coeff for LPIPS from 0.1 to 0.2 or 0.5. Both LPIPS coeffs produce similar scores.

Could there be any specifics in the implementation that might be impacting the results? Any guidance or suggestions you could provide would be greatly appreciated.

LPIPS SSIM PSNR
The paper's numbers 0.262 0.839 21.38
trained for 150K iterations with L1 loss 0.296 0.824 21.58
fine-tuned for 50K iters with L1+LPIPS+depth loss 0.286 0.806 21.07

@yilundu
Copy link
Owner

yilundu commented Sep 18, 2023

Hi,

Sorry about the difficulty in reproducing the results -- the model I used at CVPR had a combinations of hacks (I kept on tuning different hyperparameters between different parts of training to improve the visual quality).

A couple things that might be helpful:
You can try enforcing the LPIPS in larger patch windows than the current code
You can try different types of depth loss -- I think I might have mixed the L1 and L2 loss in patch

Since a lot of this isn't so reproducible -- if you would like to compare with the paper I think it would be fine to use your current numbers as the PSNR roughly seems to match (which was the main metric I optimized for anyways besides visual quality)

@takerum
Copy link
Author

takerum commented Sep 18, 2023

Hi,

Thank you for the reply!

You can try enforcing the LPIPS in larger patch windows than the current code

Do you have any suggestions regarding the specific patch window sizes to test? As I understand it, the current size is 32x32. Would it be advisable to experiment with sizes such as 48x48 or 64x64?

@yilundu
Copy link
Owner

yilundu commented Sep 20, 2023

Hi,

Sorry about the late reply -- I've gotten a bit busy with the ICLR deadline.

Yes it could be interesting to try 48x48 and 64x64 or potentially 16x16.

Best,
Yilun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants