You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'.
HI! I'm running the LLM Tuner UI and run into this issue, which has been solved in another issue https://github.com/huggingface/transformers/issues/22222#issuecomment-1477171703. However, whenever I try to simply change the LlamaTokenizer name in tokenizer_config.json in the Huggingface cache ~/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf, other issues pop whenever running the app.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████| 33/33 [00:13<00:00, 2.52it/s]
Traceback (most recent call last):
File "llm_tuner/app.py", line 147, in <module>
fire.Fire(main)
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "llm_tuner/app.py", line 119, in main
prepare_base_model(Config.default_base_model_name)
File "/home/gcpuser/sky_workdir/llm_tuner/llama_lora/models.py", line 262, in prepare_base_model
Global.new_base_model_that_is_ready_to_be_used = get_new_base_model(
File "/home/gcpuser/sky_workdir/llm_tuner/llama_lora/models.py", line 80, in get_new_base_model
tokenizer = get_tokenizer(base_model_name)
File "/home/gcpuser/sky_workdir/llm_tuner/llama_lora/models.py", line 156, in get_tokenizer
raise e
File "/home/gcpuser/sky_workdir/llm_tuner/llama_lora/models.py", line 143, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 700, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
return cls._from_pretrained(
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 89, in __init__
super().__init__(
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 114, in __init__
fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 1288, in convert_slow_tokenizer
return converter_class(transformer_tokenizer).converted()
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 445, in __init__
from .utils import sentencepiece_model_pb2 as model_pb2
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/transformers/utils/sentencepiece_model_pb2.py", line 91, in <module>
_descriptor.EnumValueDescriptor(
File "/opt/conda/envs/llm-tuner/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 796, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
Any idea on how to tackle this so that the model and tokenizer will match properly? And any insight on if it will affect finetuning results if I didn't match up the classnames earlier?
The text was updated successfully, but these errors were encountered:
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. The class this function is called from is 'LlamaTokenizer'.
HI! I'm running the LLM Tuner UI and run into this issue, which has been solved in another issue https://github.com/huggingface/transformers/issues/22222#issuecomment-1477171703. However, whenever I try to simply change the LlamaTokenizer name in
tokenizer_config.json
in the Huggingface cache~/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf
, other issues pop whenever running the app.Any idea on how to tackle this so that the model and tokenizer will match properly? And any insight on if it will affect finetuning results if I didn't match up the classnames earlier?
The text was updated successfully, but these errors were encountered: