Enforce the zero terminal SNR to schedulers #397

Warvito · 2023-05-25T20:26:39Z

According to "Common Diffusion Noise Schedules and Sample Steps are Flawed" (https://arxiv.org/pdf/2305.08891.pdf), the implementation of most schedulers do not use t = T in the sampling process. We should include the corrections to enforce the zero terminal SNR

ericspod · 2023-05-26T10:34:18Z

This could be good to do but should be motivated with our own experiment to show there's a meaningful difference.

marksgraham · 2023-05-26T22:53:07Z

Starting to play with this, will let you know what I find out

marksgraham · 2023-06-05T21:50:37Z

So I've found that starting from t=T helps for my application, but the zero terminal SNR hinders performance. However, my use cache is a bit niche (segmentation). I'm planning to re-run the FID tutorial with the changes to see if there is any difference

Warvito · 2023-06-06T10:07:10Z

Hi @marksgraham , this implementation might help with our code too (huggingface/diffusers#3664)

marksgraham · 2023-06-06T17:51:37Z

Thanks walter. Interesting to see they aren't totally convinced by the updates.

I like the idea of allowing the user to select the method for timestep discretisation (e.g. trailing, leading) as an argument to the scheduler.

I think we could just make the noise schedules with snr=0 at t=T available as new options and keep the current ones as default.

ericspod · 2023-06-07T13:15:03Z

How would we want to implement this feature? In the Scheduler constructor we could add an argument for a rescale function which we'd apply to self.betas, self.alphas, self.alphas_cumprod:

def __init__(self, num_train_timesteps: int = 1000, schedule: str = "linear_beta", rescale_func: Callable | None = None, **schedule_args) -> None:
        super().__init__()
        schedule_args["num_train_timesteps"] = num_train_timesteps
        noise_sched = NoiseSchedules[schedule](**schedule_args)

        # set betas, alphas, alphas_cumprod based off return value from noise function
        if isinstance(noise_sched, tuple):
            self.betas, self.alphas, self.alphas_cumprod = noise_sched
        else:
            self.betas = noise_sched
            self.alphas = 1.0 - self.betas
            self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)

        if rescale_func is not None:
           self.betas, self.alphas, self.alphas_cumprod = rescale_func(self.betas, self.alphas, self.alphas_cumprod)
        ...

marksgraham · 2023-06-08T21:06:23Z

Allowing for a generic function to be supplied strikes me as a bit too general, given a user can now easily specify custom beta schedules. I think we could provide a rescale_to_snr0 flag, similar to what you've proposed, or just add a new set of beta schedule options which enforce snr=0 at t=T, to complement the existing ones (e.g. linear_beta_snr0, scaled_linear_beta_snr0)

marksgraham · 2023-06-23T21:16:31Z

As it stands, I think it makes sense to allow users to specify how they want to do the timestep spacing. I've found it helps in some of my applications. I tried to get some results for FID score son MNIST, but the results are unclear. I propose we give users the option and let them decide what is right for their application. There is a suggestion of how to do it in the linked PR, what do you guys think? If it looks OK i'll implement in the other schedulers.

I haven't managed to get models with SNR=0 at t=T to train well at all, so I'm reluctant to implement it. I also note they weren't fully convinced of its utility in the huggingface discussion.

sRassmann · 2024-02-20T15:52:17Z

Hi, I found an related issue with the DDIM scheduler. When using a scheduler with a steep terminal SNR decay like a cosine scheduler*, my results are super bad:

However, if I change from the Leading (notation from the Table 2 in the Common Diffusion Noise Schedules and Sample Steps are Flawed paper) to Linspace method for timestep spacing, my results are a lot better:

    def set_timesteps(
        self, num_inference_steps: int, device: str | torch.device | None = None
    ) -> None:
        self.num_inference_steps = num_inference_steps
        step_ratio = self.num_train_timesteps // self.num_inference_steps
        # creates integer timesteps by multiplying by ratio
        # casting to int to avoid issues when num_inference_step is power of 3
        # timesteps = (np.arange(0, num_inference_steps) * step_ratio).round()[::-1].copy().astype(np.int64)  # Leading 
        timesteps = (
            np.linspace(0, self.num_train_timesteps - 1, num_inference_steps)
            .round()[::-1]
            .copy()
            .astype(np.int64)
        )  # Linspace
        self.timesteps = torch.from_numpy(timesteps).to(device)
        self.timesteps += self.steps_offset

Intuitively, I guess it makes sense that skipping the early, steep steps in the SNR is an issue for a cosine scheduler, while the first part likely does not matter so much in the case of something like a linear schedule.

Anyways, how about letting the user decide on which scheme to use similar as diffusers handles it?

*Note: I am using a custom implementation for the cosine schedule function

def betas_for_alpha_bar(
    num_diffusion_timesteps, alpha_bar, max_beta=0.999
):  # https://github.com/openai/improved-diffusion/blob/783b6740edb79fdb7d063250db2c51cc9545dcd1/improved_diffusion/gaussian_diffusion.py#L45C1-L62C27
    betas = []
    for i in range(num_diffusion_timesteps):
        t1 = i / num_diffusion_timesteps
        t2 = (i + 1) / num_diffusion_timesteps
        betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta))
    return torch.tensor(betas)


@NoiseSchedules.add_def("cosine_poly", "Cosine schedule")
def _cosine_beta(num_train_timesteps: int, s: float = 8e-3, order: float = 2, *args):
    return betas_for_alpha_bar(
        num_train_timesteps,
        lambda t: np.cos((t + s) / (1 + s) * np.pi / 2) ** order,
    )

marksgraham · 2024-03-11T09:37:17Z

Hi, sorry for the slow response.

We've had another report of bad results with the cosine scheduler here too.

I actually started implementing this, but closed it because I couldn't find any benefit, but it does seem worth implementing. My closed PR is here.

Do you have any interest in resurrecting it and doing a new PR? If not, will do it but it won't be a priority for a while as we will be focusing on integration with MONAI core for the moment.

sRassmann · 2024-03-12T07:59:10Z

No worries and still thanks for your comment!

I want to look into it, though I have some deadlines incoming so I fear I won't have time in the next couple of weeks.

virginiafdez · 2024-06-19T09:24:45Z

Can this issue be closed, since there is a new issue #489 open?

marksgraham · 2024-06-23T10:06:24Z

I'd say do :)

marksgraham mentioned this issue Jun 13, 2023

Allow users to enforce zero terminal SNR in schedulers #404

Draft

marksgraham mentioned this issue May 15, 2024

NoiseSchedules cosine seems wrong and lead to division by 0 #489

Open

marksgraham closed this as completed Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce the zero terminal SNR to schedulers #397

Enforce the zero terminal SNR to schedulers #397

Warvito commented May 25, 2023

ericspod commented May 26, 2023

marksgraham commented May 26, 2023

marksgraham commented Jun 5, 2023

Warvito commented Jun 6, 2023

marksgraham commented Jun 6, 2023

ericspod commented Jun 7, 2023

marksgraham commented Jun 8, 2023

marksgraham commented Jun 23, 2023

sRassmann commented Feb 20, 2024

marksgraham commented Mar 11, 2024

sRassmann commented Mar 12, 2024

virginiafdez commented Jun 19, 2024

marksgraham commented Jun 23, 2024

Enforce the zero terminal SNR to schedulers #397

Enforce the zero terminal SNR to schedulers #397

Comments

Warvito commented May 25, 2023

ericspod commented May 26, 2023

marksgraham commented May 26, 2023

marksgraham commented Jun 5, 2023

Warvito commented Jun 6, 2023

marksgraham commented Jun 6, 2023

ericspod commented Jun 7, 2023

marksgraham commented Jun 8, 2023

marksgraham commented Jun 23, 2023

sRassmann commented Feb 20, 2024

marksgraham commented Mar 11, 2024

sRassmann commented Mar 12, 2024

virginiafdez commented Jun 19, 2024

marksgraham commented Jun 23, 2024