You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I found that there exist several errors in the pre-training code (the file run.sh) and corresponding code. I have mentioned one in the pull request.Furthermore, it seems that we should use $PATH_TO_DATA_DICT to specific variable in the shell.
After correcting the path and file name, I found another error in the training stage:
=41667/41667=Iterations/Batches
Iteration: 0%| | 0/41667 [00:00<?, ?it/s]Finish Epoch: 0
Iteration: 0%| | 0/41667 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/gpfs/radev/scratch/ying_rex/tl688/dnaberts/DNABERT_S/train/pretrain/main.py", line 85, in <module>
run(args)
File "/gpfs/radev/scratch/ying_rex/tl688/dnaberts/DNABERT_S/train/pretrain/main.py", line 44, in run
trainer.val()
File "/gpfs/radev/scratch/ying_rex/tl688/dnaberts/DNABERT_S/train/pretrain/training.py", line 189, in val
self.model.module.dnabert2.load_state_dict(torch.load(load_dir+'/pytorch_model.bin'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/radev/project/ying_rex/tl688/llm/lib/python3.11/site-packages/torch/serialization.py", line 998, in load
with _open_file_like(f, 'rb') as opened_file:
^^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/radev/project/ying_rex/tl688/llm/lib/python3.11/site-packages/torch/serialization.py", line 445, in _open_file_like
return _open_file(name_or_buffer, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/gpfs/radev/project/ying_rex/tl688/llm/lib/python3.11/site-packages/torch/serialization.py", line 426, in __init__
super().__init__(open(name, mode))
^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: './results/epoch1.train_2w.csv.lr3e-06.lrscale100.bs48.maxlength2000.tmp0.05.seed1.con_methodsame_species.mixTrue.mix_layer_num-1.curriculumTrue/10000/pytorch_model.bin'
Would you please share your thoughts about how to address it? Thanks.
The text was updated successfully, but these errors were encountered:
Hi, I found that there exist several errors in the pre-training code (the file run.sh) and corresponding code. I have mentioned one in the pull request.Furthermore, it seems that we should use $PATH_TO_DATA_DICT to specific variable in the shell.
After correcting the path and file name, I found another error in the training stage:
Would you please share your thoughts about how to address it? Thanks.
The text was updated successfully, but these errors were encountered: