Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference #60

Open
iseesaw opened this issue Apr 25, 2024 · 4 comments

Comments

@iseesaw
Copy link

iseesaw commented Apr 25, 2024

Hello,

I've successfully finetuned Llama-3 8B with QDoRA and am now looking to perform inference using vLLM. Could you provide guidance or scripts on how to merge the QDoRA adapters with the original base model? Additionally, does this process involve quantization and dequantization of the base model?

Thank you!

@iseesaw
Copy link
Author

iseesaw commented Apr 25, 2024

I modified the merge code in Converting the State Dict.ipynb, where I replace lora with dora.

And then I merge the qlora adapter with the base model:

    config = PeftConfig.from_pretrained(PEFT_MODEL)
    model = AutoModelForCausalLM.from_pretrained(
        config.base_model_name_or_path,
        return_dict=True,
        # quantization_config=bnb_config,
        device_map="auto",
        torch_dtype=torch.bfloat16,
        # trust_remote_code=True,
    )

    model = PeftModel.from_pretrained(model, PEFT_MODEL)


    #Merge the adapter with the base model
    model = model.merge_and_unload()

    #Save the merged model in a directory "./naive_merge/" in the safetensors format
    model.save_pretrained(PEFT_MODEL + "-merged", safe_serialization=True)

    tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
    tokenizer.save_pretrained(PEFT_MODEL + "-merged")

But I got repeated response like

\nE. It is a complication of the disease\nF. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\n

I want to know where the problem occurs.? Fine-tune or weights merge?

@pe-hy
Copy link

pe-hy commented Apr 25, 2024

See my #57 also. Similar question/request.

@lochuynh1412
Copy link

lochuynh1412 commented Jun 18, 2024

This is kind of working for me. We need to convert dora name to lora name in the tensor_dict.
After getting the lora adapter, we can do normal merging to get the final model.

import torch
from peft import LoraConfig, TaskType, get_peft_config, get_peft_model
from safetensors import safe_open
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    LlamaForCausalLM,
)

tensors = {}
with safe_open(
    "model_state_dict.safetensors",
    framework="pt",
    device=0,
) as f:
    for k in f.keys():
        tensors[k] = f.get_tensor(k)  # loads the full tensor given a key
        # print(k, tensors[k].dtype, tensors[k].shape) # Uncomment to view

new_tensors = {}
for _k in tensors:
    if "dora" not in _k:
        continue
    else:
        k = "base_model.model." + _k
        k = k.replace(".dora_layer", "")
        k = k.replace(".weight", ".default.weight")
        new_tensors[k] = tensors[_k]

tensors = new_tensors

# Make sure the compute type, target modules, rank, alpha etc match!
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=False,
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = LlamaForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    use_cache=False,
    quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")

# Freeze
for param in model.parameters():
    param.requires_grad = False

# Add LoRA (make sure your rank (r) and alpha (lora_alpha) values match those used in training!)
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=64,
    lora_alpha=16,
    lora_dropout=0.1,
    # target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','up_proj','down_proj','lm_head']
    target_modules=["k_proj", "q_proj", "v_proj", "up_proj", "down_proj", "gate_proj"],
)
model = get_peft_model(model, peft_config)

# Check out the first few keys in the state dict:
print(list(model.state_dict().keys())[:10])

new_sd = model.state_dict()
for k in new_sd:
    if "lora" in k:
        new_sd[k] = tensors[k]

model.load_state_dict(new_sd, strict=False)
model.save_pretrained("lora_adapters")
tokenizer.save_pretrained("lora_adapters")

@williambarberjr
Copy link

@lochuynh1412 how's the quality of the merged model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants