-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design communication protocol for long running requests (i.e. image-to-video) #20
Comments
The diffusers pipeline has a callback per step I believe that can be used to send and notification back to the B. May also be good to gauge hardware being used with a timestamp to gauge time per step. This could be faked by an O potentially but that seems like a lot of work and would only net a short term gain before the B realizes that O is faking. |
@ad-astra-video Good suggestion! It might make sense for B to expect an update from O every N (i.e. 1, 10, etc.) steps. Then, B can keep track of the steps/sec of Os while requests are being executed as a metric to evaluate O performance. A few related possibilities:
A more general question I have is how well this step based framework can generalize for non-diffusion models? For example, does the above make sense for inference with models for upscaling, frame interpolation, etc. or is it diffusion specific? [1] https://discuss.huggingface.co/t/how-to-get-intermeidate-output-images/29144 |
So each model would have a different ppp? Will each model have a fixed workflow? From my limited experience using comfyui, I've observed significantly different processing times as the workflows gets more complex even though the number of steps remained the same. If the workflow per model is static I think the proposed payment formula will work, if the workflows can vary using the same model then I think we need to think about another approach that takes into account the actual compute used to perform a task. |
FWIW just to clarify the notes in my previous post were not proposals per se, but rather ideas for iteration and discussion.
In the scenario described, yes that is how it could work. Not the most ideal from a UX POV, but it could be a start.
There will need to be parameters that can be adjusted by a user i.e. resolution, seed and possibly things like the scheduler.
Generally agree that the pricing should align with the compute used though I think it could be reasonable to think about how to start with an imperfect rough approximation that at least mostly captures the bulk of the compute costs incurred even if certain parts are not perfectly metered and even restricting adjustment of certain parameters if needed. Moving this topic into its own thread though here #28. |
Tracked internally in https://linear.app/livepeer-ai-spe/issue/LIV-13. |
How will a caller (i.e. a B) know whether to switch Os for long running requests (i.e. > 30s)? Perhaps we can breakdown the job into smaller pieces to make switching/failover easier.
The text was updated successfully, but these errors were encountered: