Is it correct to scale & shift random noise for the VAE's factors, as if it were a just-encoded image? #3934
CodeExplode
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In the function inner_sample in samplers, the input latents are shifted and scaled (so long as they're not empty). This seems to happen after the noise is added to the empty latents, since I added a print statement there which fired during SD3 inference.
https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/samplers.py#L655
Training code I've seen generally has only applied the VAE scaling/shifting to the encoded images, not the random noise. Depending on how models were trained, this might cause some differences in expected input. my assumption has been that the VAE scaling/shifting was necessary to move the encoded images to a distribution which random noise would already have.
Beta Was this translation helpful? Give feedback.
All reactions