'token_embd.weight' not found with Ollama #9120

Timelessprod · 2024-08-21T15:06:27Z

Timelessprod
Aug 21, 2024

Hello,

I fine-tuned the Meta Llama 3.1 8B model with a HuggingFace SFTTrainer and saved the model locally. Below is the config for the trainer:

from peft import LoraConfig
from transformers import TrainingArguments
from trl import SFTConfig, SFTTrainer

peft_config: LoraConfig = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

training_arguments: TrainingArguments = TrainingArguments(
    output_dir=OUTPUT_MODEL_NAME,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=16,
    optim="paged_adamw_32bit",
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    save_strategy="epoch",
    logging_steps=10,
    num_train_epochs=EPOCHS,
    max_steps=250,
    fp16=True
)

sft_config: SFTConfig = SFTConfig(**training_arguments.to_dict())

trainer: SFTTrainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=validation_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    args=sft_config,
    tokenizer=tokenizer,
    max_seq_length=1024
)

trainer.train()
trainer.save_model('path/to/output')

The training went well. After training I saved the model and converted it to GGUF format using llama.cpp's (latest version) script convert_hf_to_gguf.py with --outtype q8_0. Then I added the model to Ollama with ollama create <model-name> -f Modelfile, same everything fine here. However when I try to run the model with the Ollama prompt I get the error :

Error: llama runner process has terminated: error loading model: check_tensor_dims: tensor 'token_embd.weight' not found

So it seems there was a problem with the conversion to GGUF and this field goes missing, am I right ? Is there anything I can do to fix it or is it a bug on llama.cpp or ollama ?

Thank you!

Timelessprod · 2024-08-22T07:34:31Z

Timelessprod
Aug 22, 2024
Author

It seems the save_model() method doesn'T save the full model but only the adapters which makes the total amount of files roughly a few MB. Maybe that's what's making the conversion fail.

1 reply

madcato Sep 17, 2024

Did you try to create the ollama model by using the adapter?

FROM <model name>
ADAPTER /path/to/file.gguf

Then you should not use convert_hf_to_gguf.py, but convert_lora_to_gguf.py

madcato · 2024-09-17T10:57:16Z

madcato
Sep 17, 2024

I found a workaround for this same problem.

Instead using the ADAPTER in the ollama Modelfile, I used this script to merge the layers of the original model with the lora fine-tuned one:
https://github.com/bigcode-project/starcoder/blob/main/finetune/merge_peft_adapters.py

Then I converted the merged model to gguf, managed to import it into ollama and it did not cause the 'token_embd.weight' error by running it.

PS: I'm not sure if this worked for me at all, because the manual tests I performed with this fine-tuned model did not turn out as I expected.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'token_embd.weight' not found with Ollama #9120

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

'token_embd.weight' not found with Ollama #9120

Timelessprod Aug 21, 2024

Replies: 2 comments · 1 reply

Timelessprod Aug 22, 2024 Author

madcato Sep 17, 2024

madcato Sep 17, 2024

Timelessprod
Aug 21, 2024

Replies: 2 comments 1 reply

Timelessprod
Aug 22, 2024
Author

madcato
Sep 17, 2024