Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调/推理 下载至本地的Pixtral-12B模型时报错:无法load tokenizer #6532

Closed
1 task done
Felixvillas opened this issue Jan 5, 2025 · 1 comment
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@Felixvillas
Copy link

Felixvillas commented Jan 5, 2025

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-4.18.0-425.10.1.el8_7.x86_64-x86_64-with-glibc2.28
  • Python version: 3.10.15
  • PyTorch version: 2.5.1 (GPU)
  • Transformers version: 4.46.1
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA L40
  • vLLM version: 0.6.6.post1

Reproduction

我是将pixtral-12B的模型下载到了本地,如下:
1736083362111


随后我利用自己收集的dpo数据,用如下指令微调该模型

llamafactory-cli train examples/train_lora/pixtral_lora_dpo.yaml

其中pixtral_lora_dpo.yaml的内容如下:

### model
model_name_or_path: /lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/
trust_remote_code: true

### method
stage: dpo
do_train: true
finetuning_type: lora
lora_target: all
pref_beta: 0.1
pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]

### dataset
dataset: spatial_intelligence_dpo
template: pixtral
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: ../workdir/saves/Pixtral-12B/lora/dpo
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

得到如下报错信息

[rank0]: Traceback (most recent call last):
[rank0]:   File "/nfs_global/S/tianzikang/rocky/projects/spatial_intelligence/LLaMA-Factory/src/llamafactory/model/loader.py", line 71, in load_tokenizer
[rank0]:     tokenizer = AutoTokenizer.from_pretrained(
[rank0]:   File "/lustre/S/tianzikang/rocky/miniconda3/envs/omnigibson/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 939, in from_pretrained
[rank0]:     return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
[rank0]:   File "/lustre/S/tianzikang/rocky/miniconda3/envs/omnigibson/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2197, in from_pretrained
[rank0]:     raise EnvironmentError(
[rank0]: OSError: Can't load tokenizer for '/lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

另外用指令API_PORT=8000 llamafactory-cli api examples/inference/pixtral_vllm.yaml也会报相同的错误,其中pixtral_vllm.yaml内容如下:

model_name_or_path: /lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/
template: pixtral
infer_backend: vllm
vllm_enforce_eager: true
trust_remote_code: true

该指令报错信息如下:

[INFO|configuration_utils.py:746] 2025-01-05 21:56:24,983 >> Model config MistralConfig {
  "_name_or_path": "/lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/",
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_theta": 10000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "transformers_version": "4.46.1",
  "use_cache": true,
  "vocab_size": 32000
}

[INFO|configuration_utils.py:746] 2025-01-05 21:56:24,983 >> Model config MistralConfig {
  "_name_or_path": "/lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/",
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_theta": 10000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "transformers_version": "4.46.1",
  "use_cache": true,
  "vocab_size": 32000
}

[INFO|tokenization_auto.py:706] 2025-01-05 21:56:24,984 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
[INFO|configuration_utils.py:746] 2025-01-05 21:56:24,984 >> Model config MistralConfig {
  "_name_or_path": "/lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/",
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_theta": 10000.0,
  "sliding_window": 4096,
  "tie_word_embeddings": false,
  "transformers_version": "4.46.1",
  "use_cache": true,
  "vocab_size": 32000
}

Traceback (most recent call last):
  File "/nfs_global/S/tianzikang/rocky/projects/spatial_intelligence/LLaMA-Factory/src/llamafactory/model/loader.py", line 71, in load_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
  File "/lustre/S/tianzikang/rocky/miniconda3/envs/omnigibson/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 939, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/lustre/S/tianzikang/rocky/miniconda3/envs/omnigibson/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2197, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for '/lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/lustre/S/tianzikang/rocky/miniconda3/envs/omnigibson/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/nfs_global/S/tianzikang/rocky/projects/spatial_intelligence/LLaMA-Factory/src/llamafactory/cli.py", line 79, in main
    run_api()
  File "/nfs_global/S/tianzikang/rocky/projects/spatial_intelligence/LLaMA-Factory/src/llamafactory/api/app.py", line 129, in run_api
    chat_model = ChatModel()
  File "/nfs_global/S/tianzikang/rocky/projects/spatial_intelligence/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 54, in __init__
    self.engine: "BaseEngine" = VllmEngine(model_args, data_args, finetuning_args, generating_args)
  File "/nfs_global/S/tianzikang/rocky/projects/spatial_intelligence/LLaMA-Factory/src/llamafactory/chat/vllm_engine.py", line 65, in __init__
    tokenizer_module = load_tokenizer(model_args)
  File "/nfs_global/S/tianzikang/rocky/projects/spatial_intelligence/LLaMA-Factory/src/llamafactory/model/loader.py", line 86, in load_tokenizer
    raise OSError("Failed to load tokenizer.") from e
OSError: Failed to load tokenizer.

但是用mistral所给例子直接利用vllm框架推理,不会出现类似的错误。即
CUDA_VISIBLE_DEVICES=0 vllm serve /lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/ \ --tokenizer_mode mistral --limit_mm_per_prompt 'image=6' --max-model-len 32768先以api的形式将pixtral部署,然后用如下代码问问题:

from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

completion = client.completions.create(model="/lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/",
                                      prompt="San Francisco is a", max_tokens=8192, temperature=0.7)
print("Completion result:", completion.choices[0].text)

得到如下输出:

Completion result:  beautiful city, and it’s no surprise that so many visitors flock to this area every year. There are so many things to do in San Francisco, from exploring the Golden Gate Bridge to walking along the Fisherman’s Wharf. Whether you’re looking for a fun family outing or want to enjoy some time alone, San Francisco is the perfect place for you. This blog post will discuss some of the best activities in San Francisco for families with kids.
balabala...

或者不将pixtral部署成api形式,直接用vllm推理,代码如下:

import os, sys
sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
# from robosuite.environments.manipulation.spatial_intelligence import SpatialIntelligence

from vllm import LLM
from vllm.sampling_params import SamplingParams

model_name = "/lustre/S/tianzikang/LLMs/mistralai-Pixtral-12B-2409/mistralai-Pixtral-12B-2409/"
max_img_per_msg = 5

sampling_params = SamplingParams(max_tokens=8192, temperature=0.7)

# Lower max_num_seqs or max_model_len on low-VRAM GPUs.
llm = LLM(model=model_name, tokenizer_mode="mistral", limit_mm_per_prompt={"image": max_img_per_msg}, max_model_len=32768)

messages = [
    {
        "role": "user",
        "content": "San Francisco is a",
    }
]

outputs = llm.chat(messages=messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

同样可以得到输出,这说明我下载到本地的pixtral-12B模型是没有问题的,但确实无法使用llama-factory进行推理以及微调或者训练


我猜测是pixtral-12B已经以tekken.json的形式提供了tokenizer(因为可以通过tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")的形式导入load该tokenizer),所以应该是llama-factory尚未支持这种形式的tokenizer吗?

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jan 5, 2025
@Felixvillas Felixvillas changed the title 微调Pixtral-12B模型时无法load tokenizer 微调下载至本地的Pixtral-12B模型时无法load tokenizer Jan 5, 2025
@Felixvillas Felixvillas changed the title 微调下载至本地的Pixtral-12B模型时无法load tokenizer 微调/推理 下载至本地的Pixtral-12B模型时无法load tokenizer Jan 5, 2025
@Felixvillas Felixvillas changed the title 微调/推理 下载至本地的Pixtral-12B模型时无法load tokenizer 微调/推理 下载至本地的Pixtral-12B模型时报错:无法load tokenizer Jan 5, 2025
@hiyouga
Copy link
Owner

hiyouga commented Jan 6, 2025

@hiyouga hiyouga closed this as completed Jan 6, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants