You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Loading checkpoint shards: 100%|██████████████████| 7/7 [00:03<00:00, 2.02it/s]
trainable params: 1,949,696 || all params: 6,245,533,696 || trainable%: 0.031217444255383614
--> Model
System Info / 系統信息
Cuda 11.5
Transformers 4.40.2
Python 3.12.2
GPU 4090 单卡
内存 32GB
系统:windows wsl2
Who can help? / 谁可以帮助到您?
@Btlmd
Information / 问题信息
Reproduction / 复现过程
lora_finetune.ipynb 运行 !CUDA_VISIBLE_DEVICES=0 NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" python finetune_hf.py data/AdvertiseGen_fix /home/xiebeichen/chatglm3-6b configs/lora.yaml
输出
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Loading checkpoint shards: 100%|██████████████████| 7/7 [00:03<00:00, 2.02it/s]
trainable params: 1,949,696 || all params: 6,245,533,696 || trainable%: 0.031217444255383614
--> Model
--> model has 1.949696M params
train_dataset: Dataset({
features: ['input_ids', 'labels'],
num_rows: 114599
})
val_dataset: Dataset({
features: ['input_ids', 'output_ids'],
num_rows: 1070
})
test_dataset: Dataset({
features: ['input_ids', 'output_ids'],
num_rows: 1070
})
--> Sanity check
'[gMASK]': 64790 -> -100
'sop': 64792 -> -100
'<|user|>': 64795 -> -100
...
Total train batch size (w. parallel, distributed & accumulation) = 4
Gradient Accumulation steps = 1
Total optimization steps = 3,000
Number of trainable parameters = 1,949,696
0%| | 0/3000 [00:00<?, ?it/s]
会卡在开始训练时,0/3000 这个节点,没有报错,nvidia-smi 运行发现显卡并没有被调用
Expected behavior / 期待表现
正常调用 gpu,正常训练
The text was updated successfully, but these errors were encountered: