Cannot load decoder.lm_head.weight when loading 4 bit quantized model using VisionEncoderDecoder.from_pretrained #1343

AditiJain14 · 2024-08-30T20:18:25Z

System Info

I am trying to load a finetuned and quantized to 4bit Donut model. While save_pretrained works fine, when I try to load the quantized model (at quant_path) as
model = VisionEncoderDecoderModel.from_pretrained(quant_path, load_in_4bit = True), it loads all of the parameters correct except decoder.lm_head.weight, which is instead reset. I am unable to find the cause of this issue, and it happens both when 1. I load the quantized model, or 2. when I load the finetuned checkpoint with load_in_4bit argument.

I have tried the same steps with 'naver-clova-ix/donut-base' model from huggingface and it works fine. Any help would be much appreciated!

Reproduction

from transformers import VisionEncoderDecoderModel
finetuned_model.save_pretrained(finetuned_path, safe_serialization = False) #Safe_serialisation = True discards lm_head.weight
model = VisionEncoderDecoderModel.from_pretrained(finetuned_path, load_in_4bit=True)

Expected behavior

The model is loaded with the decoder.lm_head.weight from the finetuned checkpoint

AditiJain14 · 2024-08-30T20:34:25Z

To add: on checking the model.safetensors saved using save_pretrained method, the state_dict does not contain "decoder.lm_head.weight".
Somehow, this does not create a problem when calling from_pretrained without load_4_bit argument.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot load decoder.lm_head.weight when loading 4 bit quantized model using VisionEncoderDecoder.from_pretrained #1343

Cannot load decoder.lm_head.weight when loading 4 bit quantized model using VisionEncoderDecoder.from_pretrained #1343

AditiJain14 commented Aug 30, 2024

AditiJain14 commented Aug 30, 2024

Cannot load decoder.lm_head.weight when loading 4 bit quantized model using VisionEncoderDecoder.from_pretrained #1343

Cannot load decoder.lm_head.weight when loading 4 bit quantized model using VisionEncoderDecoder.from_pretrained #1343

Comments

AditiJain14 commented Aug 30, 2024

System Info

Reproduction

Expected behavior

AditiJain14 commented Aug 30, 2024