You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thank you for open-sourcing HunyuanVideo and for the awesome work! The availability of FP8 quantized weights significantly reduces GPU memory usage, which is impressive. I’ve been experimenting with FP8 inference on a single H100 GPU, but I encountered some concerns and would like to seek clarification:
Single-GPU FP8 Inference Speed:
I noticed that when performing inference with FP8 weights on a single H100 GPU, the speed is slower compared to loading and using a standard FP16 model. Could you explain why this is the case? Are there specific optimizations required or limitations with FP8 that might impact performance?
Multi-GPU FP8 Inference:
Does the current implementation support inference with FP8 weights on multiple H100 GPUs? If not, are there plans to enable multi-GPU support for FP8 models in the near future? Any guidance on how to set this up would be greatly appreciated.
Description
First, thank you for open-sourcing HunyuanVideo and for the awesome work! The availability of FP8 quantized weights significantly reduces GPU memory usage, which is impressive. I’ve been experimenting with FP8 inference on a single H100 GPU, but I encountered some concerns and would like to seek clarification:
Single-GPU FP8 Inference Speed:
I noticed that when performing inference with FP8 weights on a single H100 GPU, the speed is slower compared to loading and using a standard FP16 model. Could you explain why this is the case? Are there specific optimizations required or limitations with FP8 that might impact performance?
Multi-GPU FP8 Inference:
Does the current implementation support inference with FP8 weights on multiple H100 GPUs? If not, are there plans to enable multi-GPU support for FP8 models in the near future? Any guidance on how to set this up would be greatly appreciated.
Context
Here is the command I used for FP8 inference:
Any insights or updates on these questions would be immensely helpful for optimizing our workflow.
Thank you!
The text was updated successfully, but these errors were encountered: