-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDXL-Lightning inference steps ignored #107
Comments
Agreed that this behavior is confusing. FWIW I originally implemented this as a quick hack to support loading a specific N-step checkpoint for SDXL-Lightning (since all the SDXL-Lightning checkpoints are tied with a specific # of inference steps) on pipeline initialization. LoRA switching at inference time could work (I'm not sure that unet switching at inference time would be a good idea as that would probably incur a lot more overhead), but since the general LoRA switching logic is not implemented yet IMO starting with the low hanging fruit of establishing clearer docs would be a better place to start. |
@stronk-dev, @yondonfu, what are your thoughts on removing the 2/4-step models and exclusively serving the 8-step model, while documenting the behavior of the unused parameters? The 2-step model is only 1 second faster, and the difference between the 4-step and 8-step models is minimal. I think this will decrease the confusion. |
I haven't done testing with the 4 step model, so I can't speak to it's inference speed and quality difference. I'd expect there to be a bigger difference in inference time (I think the worker prints the amount of it/sec ? You could calculate the extra time required using that). I'd certainly prefer the simplicity of advertising just 1 model, but I would be curious to see the quality difference between 4-step and 8-step first |
IMO the 2/4-step models should continue to be supported and the 8-step model can just be used as the default if the model ID is set to My reasoning here is that each |
Premise: this is a model which does not accept the
guidance_scale
param and loads a specific set of model weights according to the amount ofnum_inference_steps
you want to do (1, 2, 4 or 8 steps).As apps would request the
ByteDance/SDXL-Lightning
model, the following code would make it default to 2 steps:ai-worker/runner/app/pipelines/text_to_image.py
Lines 57 to 71 in 0a26654
And then when running inference, it would override
num_inference_steps
to 2:ai-worker/runner/app/pipelines/text_to_image.py
Lines 188 to 201 in 0a26654
Apparently apps needs to append
4step
or8step
to the model ID if they want to do a different amount ofnum_inference_steps
. This can be very confusing to app developers, who likely just requestByteDance/SDXL-Lightning
with a specific number ofnum_inference_steps
, which then quietly get overwritten during inference.This would also explain why people have reported this model to have bad output, as running this model at 8 steps provides a vastly different output than at 2 steps.
Proposed solutions could be to switch unet/LoRas during inference or to make the documentation very clear how this specifc model behaves. Luckily with models like
RealVisXL_V4.0_Lightning
you're not tied to a specific amount ofinference_steps
The text was updated successfully, but these errors were encountered: