Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design communication protocol for long running requests (i.e. image-to-video) #20

Open
yondonfu opened this issue Jan 29, 2024 · 5 comments

Comments

@yondonfu
Copy link
Member

How will a caller (i.e. a B) know whether to switch Os for long running requests (i.e. > 30s)? Perhaps we can breakdown the job into smaller pieces to make switching/failover easier.

@ad-astra-video
Copy link
Collaborator

The diffusers pipeline has a callback per step I believe that can be used to send and notification back to the B. May also be good to gauge hardware being used with a timestamp to gauge time per step.

This could be faked by an O potentially but that seems like a lot of work and would only net a short term gain before the B realizes that O is faking.

@yondonfu
Copy link
Member Author

yondonfu commented Feb 16, 2024

@ad-astra-video Good suggestion!

It might make sense for B to expect an update from O every N (i.e. 1, 10, etc.) steps. Then, B can keep track of the steps/sec of Os while requests are being executed as a metric to evaluate O performance. A few related possibilities:

  • O could expect to receive a payment for every N steps. I think (should be tested) the computational cost for each step is proportional to the output resolution so given a specific price per pixel for a model, the total fee for a request could be calculated as something like price per pixel * output height * output width * # steps and the payment amount for every N steps would be price per pixel * output height * output width * N steps.
  • The callback per step could be used to add support for interruptions to an in-progress request if O knows that the result is no longer needed.
  • The callback per step could be used to send intermediate images (before all steps are complete) to B as a part of an update which could be useful for presenting intermediate images to end users before all steps are complete. This would require O to use a callback per step to get intermediate latents from the pipeline and then use the pipeline's VAE to decode the latents([1][2]) into images for each callback. I also wonder if it is possible for the intermediate image to be used as an input to a new diffusion request with a new O as a way to "resume" diffusion on another O?

A more general question I have is how well this step based framework can generalize for non-diffusion models? For example, does the above make sense for inference with models for upscaling, frame interpolation, etc. or is it diffusion specific?

[1] https://discuss.huggingface.co/t/how-to-get-intermeidate-output-images/29144
[2] https://github.com/huggingface/diffusers/blob/777063e1bfda024e7dfc3a9ba2acb20552aec6bc/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L1090

@papabear99
Copy link

  • O could expect to receive a payment for every N steps. I think (should be tested) the computational cost for each step is proportional to the output resolution so given a specific price per pixel for a model, the total fee for a request could be calculated as something like price per pixel * output height * output width * # steps and the payment amount for every N steps would be price per pixel * output height * output width * N steps

So each model would have a different ppp? Will each model have a fixed workflow? From my limited experience using comfyui, I've observed significantly different processing times as the workflows gets more complex even though the number of steps remained the same.

If the workflow per model is static I think the proposed payment formula will work, if the workflows can vary using the same model then I think we need to think about another approach that takes into account the actual compute used to perform a task.

@yondonfu
Copy link
Member Author

FWIW just to clarify the notes in my previous post were not proposals per se, but rather ideas for iteration and discussion.

So each model would have a different ppp?

In the scenario described, yes that is how it could work. Not the most ideal from a UX POV, but it could be a start.

Will each model have a fixed workflow?

There will need to be parameters that can be adjusted by a user i.e. resolution, seed and possibly things like the scheduler.

I think we need to think about another approach that takes into account the actual compute used to perform a task.

Generally agree that the pricing should align with the compute used though I think it could be reasonable to think about how to start with an imperfect rough approximation that at least mostly captures the bulk of the compute costs incurred even if certain parts are not perfectly metered and even restricting adjustment of certain parameters if needed.

Moving this topic into its own thread though here #28.

@rickstaa
Copy link
Member

rickstaa commented May 8, 2024

Tracked internally in https://linear.app/livepeer-ai-spe/issue/LIV-13.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants