You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
building GPT2BPETokenizer tokenizer ...
Traceback (most recent call last):
File "/workspace/megatron/pretrain_gpt.py", line 243, in
pretrain(
File "/workspace/megatron/megatron/training/training.py", line 190, in pretrain
initialize_megatron(extra_args_provider=extra_args_provider,
File "/workspace/megatron/megatron/training/initialize.py", line 62, in initialize_megatron
set_global_variables(args)
File "/workspace/megatron/megatron/training/global_vars.py", line 100, in set_global_variables
_ = _build_tokenizer(args)
File "/workspace/megatron/megatron/training/global_vars.py", line 130, in _build_tokenizer
_GLOBAL_TOKENIZER = build_tokenizer(args)
File "/workspace/megatron/megatron/training/tokenizer/tokenizer.py", line 36, in build_tokenizer
tokenizer = _GPT2BPETokenizer(args.vocab_file, args.merge_file)
File "/workspace/megatron/megatron/training/tokenizer/tokenizer.py", line 262, in init
self.tokenizer = GPT2Tokenizer(vocab_file, merge_file, errors='replace',
File "/workspace/megatron/megatron/training/tokenizer/gpt2_tokenization.py", line 159, in init
self.encoder = json.load(open(vocab_file))
File "/usr/lib/python3.10/json/init.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.10/json/init.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
This discussion was converted from issue #910 on August 07, 2024 18:24.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
my shell scripts:
`PYTORCH_IMAGE=nvcr.io/nvidia/pytorch:23.12-py3
CHECKPOINT_PATH="/workspace/checkpoints" #
VOCAB_FILE="/workspace/dataset/my_vocab.json" #/gpt2-vocab.json
MERGE_FILE="/workspace/dataset/merges.txt" #/gpt2-merges.txt
DATA_PATH="/workspace/dataset" #_text_document
docker run
--gpus=all
--ipc=host
--workdir /workspace/megatron
-v /root/data0/Megatron-LM:/workspace/megatron
-v /root/data0/Megatron-LM/workspace/dataset:/workspace/dataset
-v /root/data0/Megatron-LM/workspace/checkpoints:/workspace/checkpoints
$PYTORCH_IMAGE
bash examples/gpt3/train_gpt2_857M_distributed.sh $CHECKPOINT_PATH $VOCAB_FILE $MERGE_FILE $DATA_PATH `
The download URL for my_vocab.json is:
https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json
After running this script, it prompts:
Can you help me check where the problem lies?
Beta Was this translation helpful? Give feedback.
All reactions