Performance and Multi-GPU Support for FP8 Inference #160

Jinxiaolong1129 · 2025-01-02T19:12:53Z

Description

First, thank you for open-sourcing HunyuanVideo and for the awesome work! The availability of FP8 quantized weights significantly reduces GPU memory usage, which is impressive. I’ve been experimenting with FP8 inference on a single H100 GPU, but I encountered some concerns and would like to seek clarification:

Single-GPU FP8 Inference Speed:
I noticed that when performing inference with FP8 weights on a single H100 GPU, the speed is slower compared to loading and using a standard FP16 model. Could you explain why this is the case? Are there specific optimizations required or limitations with FP8 that might impact performance?
Multi-GPU FP8 Inference:
Does the current implementation support inference with FP8 weights on multiple H100 GPUs? If not, are there plans to enable multi-GPU support for FP8 models in the near future? Any guidance on how to set this up would be greatly appreciated.

Context

Here is the command I used for FP8 inference:

cd HunyuanVideo

DIT_CKPT_PATH={PATH_TO_FP8_WEIGHTS}/{WEIGHT_NAME}_fp8.pt

python3 sample_video.py \
    --dit-weight ${DIT_CKPT_PATH} \
    --video-size 1280 720 \
    --video-length 129 \
    --infer-steps 50 \
    --prompt "A cat walks on the grass, realistic style." \
    --seed 42 \
    --embedded-cfg-scale 6.0 \
    --flow-shift 7.0 \
    --flow-reverse \
    --use-cpu-offload \
    --use-fp8 \
    --save-path ./results

Any insights or updates on these questions would be immensely helpful for optimizing our workflow.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance and Multi-GPU Support for FP8 Inference #160

Performance and Multi-GPU Support for FP8 Inference #160

Jinxiaolong1129 commented Jan 2, 2025

Performance and Multi-GPU Support for FP8 Inference #160

Performance and Multi-GPU Support for FP8 Inference #160

Comments

Jinxiaolong1129 commented Jan 2, 2025

Description

Context