Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NoiseSchedules cosine seems wrong and lead to division by 0 #489

Open
Leirbag-gabrieL opened this issue May 15, 2024 · 7 comments
Open

NoiseSchedules cosine seems wrong and lead to division by 0 #489

Leirbag-gabrieL opened this issue May 15, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@Leirbag-gabrieL
Copy link

I wanted to use a DDPMScheduler with a cosine scheduling and obtained images filled with nan when sampling images.

I quickly inspected the code and found that it was caused by a division by 0 in the step function of the class DDPMScheduler right here :

pred_original_sample_coeff = (alpha_prod_t_prev ** (0.5) * self.betas[timestep]) / beta_prod_t
current_sample_coeff = self.alphas[timestep] ** (0.5) * beta_prod_t_prev / beta_prod_t

beta_prod_t being equal to 0 at step 0 when using cosine scheduler because it comes from :

alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[timestep - 1] if timestep > 0 else self.one
beta_prod_t = 1 - alpha_prod_t

alphas_cumprod calculated like so in this case :

x = torch.linspace(0, num_train_timesteps, num_train_timesteps + 1)
alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
alphas_cumprod /= alphas_cumprod[0].item()

Thus, alpha_cumprod[0] = 1 and beta_prod_t = 1 - 1 = 0

I saw no issue reporting this, maybe I am using it wrong. 🤷‍♂️
I tried using DDPMScheduler(num_train_timesteps=1000, schedule="cosine") in the 2d_ddpm_compare_schedulers.ipynb and got nan filled images as result.

@marksgraham
Copy link
Collaborator

Hi there,

In the cosine schedule the alpha/beta are calculated with clipping, so beta_prod_t is not 0 when t=0 as far as i can see:

    x = torch.linspace(0, num_train_timesteps, num_train_timesteps + 1)
    alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
    alphas_cumprod /= alphas_cumprod[0].item()
    alphas = torch.clip(alphas_cumprod[1:] / alphas_cumprod[:-1], 0.0001, 0.9999)
    betas = 1.0 - alphas
    return betas, alphas, alphas_cumprod[:-1]

however there are documented problems with the cosine scheduler, see discussion here

@sRassman reports better results if you try using leading timesteps here could you try that and see if it fixes it for you?

@Leirbag-gabrieL
Copy link
Author

Hi thanks for your quick response,

Indeed the alphas are clipped in the code snippet you linked, but that's not those values which are used in the scheduler.
From what I saw it is the step function of DDPMScheduler which does that.

In the beginning of the step function, some variables are defined:

alpha_prod_t = self.alphas_cumprod[timestep]
alpha_prod_t_prev = self.alphas_cumprod[timestep - 1] if timestep > 0 else self.one
beta_prod_t = 1 - alpha_prod_t
beta_prod_t_prev = 1 - alpha_prod_t_prev

The issue comes from the alpha_prod_t variable which is equal to 1 at timestep 0 because with cosine scheduler enable alphas_cumprod is defined like so :

alphas_cumprod = torch.cos(((x / num_train_timesteps) + s) / (1 + s) * torch.pi * 0.5) ** 2
alphas_cumprod /= alphas_cumprod[0].item()

So alpha_prod_t at time step 0 is always equal to 1 and so beta_prod_t = 1 - alpha_prod_t is equal to 0 and later in that step function values are divided by this same beta_prod_t (equal to 0 and thus leading to NaN results).

I will try what @sRassman proposed and give you a feedback later 👍

@marksgraham
Copy link
Collaborator

ah yes nice spot - it seems like we should be making sure alpha cumprod is calculated from the clipped alphas before we return it from the cosine scheduler

@OdedRotem314
Copy link

Hi

I came across the same issue of receiving Nans with Cosine due to devision by zero.
I looked to see if there is an open issue about it and here it is.
Are there immediate plans to fix this? How do you suggest to handle it right now?

Thanks
oded

@virginiafdez
Copy link
Contributor

Dear Oded,

Note that the MONAI Generative Models repository will be soon archived because the code has been integrated in MONAI core (https://github.com/Project-MONAI). Could you check if using the latest version of the schedulers from MONAI core leads to the same error?

If so, we will look at it immediately.
Otherwise, please use that alternative repository.

Thank you very much!

Virginia

@OdedRotem314
Copy link

OdedRotem314 commented Sep 23, 2024 via email

@virginiafdez
Copy link
Contributor

Dear Oded

Thanks. We will look into it.
Could you please open an issue in MONAI core describing the problem so that we can have a look at the problem from there and trace it?

Thanks!

Virginia

@virginiafdez virginiafdez added the bug Something isn't working label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants