Is it correct to scale & shift random noise for the VAE's factors, as if it were a just-encoded image? #3934

CodeExplode · 2024-07-03T11:10:06Z

CodeExplode
Jul 3, 2024

In the function inner_sample in samplers, the input latents are shifted and scaled (so long as they're not empty). This seems to happen after the noise is added to the empty latents, since I added a print statement there which fired during SD3 inference.

https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/samplers.py#L655

Training code I've seen generally has only applied the VAE scaling/shifting to the encoded images, not the random noise. Depending on how models were trained, this might cause some differences in expected input. my assumption has been that the VAE scaling/shifting was necessary to move the encoded images to a distribution which random noise would already have.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it correct to scale & shift random noise for the VAE's factors, as if it were a just-encoded image? #3934

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Is it correct to scale & shift random noise for the VAE's factors, as if it were a just-encoded image? #3934

CodeExplode Jul 3, 2024

Replies: 0 comments

CodeExplode
Jul 3, 2024