Redrafter fp8 support #2607

darraghdog · 2024-12-22T12:51:16Z

I am deploying Qwen/QwQ-32B-Preview on the a 4 X L4 (24GB per card) environment. I have an fp8 quantised model (I used the llmapi to quantise it) which fits in memory using 32GB, and the remaining memory for the kv cache. I see redrafter supports fp8 in the support matrix.

I have a fp32 redrafter which was trained on the bf16 version of the base model. I would like to convert a quantised fp8 base model (modelopt format) and the fp32 redrafter together. However I see that the convert script only accepts a base model with fp16/fp32/bf16 (link). A bf16 model would allocate too much memory and leave little remaining for kv cache.
I am wondering should it be possible to make it work with fp8 base model (already quantised to fp8 as below); I am happy to modify the conversion script as needed.

Quantisation params of base model

# modified from https://nvidia.github.io/TensorRT-LLM/llm-api-examples/llm_quantization.html 
python3 scripts/quant_llm_api_dist_01.py \
    --model_in_path "/workspace/trtllm/Qwen-QwQ-32B-Preview/" \
    --model_out_path "/workspace/trtllm/Qwen-QwQ-32B-Preview_FP8_KVFP8_tp4/" \
    --quant_algo "FP8" --tp_size 4 --calib_dataset "demo/qwq_cot" \
    --calib_batches 512 --calib_seq_length 1024 --max_batch_size 8 --fp8_kv_cache

fp8 base model config

{
    "producer": {
        "name": "modelopt",
        "version": "0.19.0"
    },
    "architecture": "QWenForCausalLM",
    "dtype": "float16",
    "logits_dtype": "float16",
    "num_hidden_layers": 64,
    "num_attention_heads": 40,
    "num_key_value_heads": 8,
    "hidden_size": 5120,
    "norm_epsilon": 1e-05,
    "vocab_size": 152064,
    "max_position_embeddings": 32768,
    "hidden_act": "silu",
    "use_parallel_embedding": true,
    "embedding_sharding_dim": 0,
    "quantization": {
        "quant_algo": "FP8",
        "kv_cache_quant_algo": "FP8",
        "exclude_modules": [ ....

The text was updated successfully, but these errors were encountered:

darraghdog · 2024-12-24T23:47:20Z

Closing this, as its working ok now.

darraghdog closed this as completed Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redrafter fp8 support #2607

Redrafter fp8 support #2607

darraghdog commented Dec 22, 2024 •

edited

Loading

darraghdog commented Dec 24, 2024

Redrafter fp8 support #2607

Redrafter fp8 support #2607

Comments

darraghdog commented Dec 22, 2024 • edited Loading

darraghdog commented Dec 24, 2024

darraghdog commented Dec 22, 2024 •

edited

Loading