Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gemma 2 + unsloth + fa2 full SFT RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #5435

Closed
1 task done
hengdos opened this issue Sep 13, 2024 · 0 comments
Labels
wontfix This will not be worked on

Comments

@hengdos
Copy link

hengdos commented Sep 13, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

[2024-09-13 18:40:14,881] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
WARNING: BNB_CUDA_VERSION=121 environment variable detected; loading libbitsandbytes_cuda121.so.
This can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64

  • llamafactory version: 0.8.4.dev0
  • Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.31
  • Python version: 3.10.14
  • PyTorch version: 2.4.0 (GPU)
  • Transformers version: 4.44.2
  • Datasets version: 2.21.0
  • Accelerate version: 0.33.0
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A800 80GB PCIe
  • DeepSpeed version: 0.15.1
  • Bitsandbytes version: 0.43.3

Reproduction

config file

#gemma2_full_sft.yaml
### model
model_name_or_path: /root/.cache/modelscope/hub/LLM-Research/gemma-2-9b-it

### method
stage: sft
do_train: true
finetuning_type: full
use_unsloth: true
flash_attn: fa2

### dataset
dataset: identity
template: gemma
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/gemma-2-9b-it/full/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true

### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

run training

llamafactory-cli train examples/train_full/gemma2_full_sft.yaml

output

[INFO|trainer.py:648] 2024-09-13 18:34:56,525 >> Using auto half precision backend
[WARNING|<string>:213] 2024-09-13 18:34:56,905 >> ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 81 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 2
\        /    Total batch size = 2 | Total steps = 120
 "-____-"     Number of trainable parameters = 0
  0%|                                                                                                                  | 0/120 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/root/miniconda3/envs/unsloth_env/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/root/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
    run_exp()
  File "/root/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/root/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 96, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/root/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
  File "<string>", line 363, in _fast_inner_training_loop
  File "/root/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3349, in training_step
    self.accelerator.backward(loss, **kwargs)
  File "/root/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/accelerator.py", line 2159, in backward
    loss.backward(**kwargs)
  File "/root/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/_tensor.py", line 521, in backward
    torch.autograd.backward(
  File "/root/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/__init__.py", line 289, in backward
    _engine_run_backward(
  File "/root/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/graph.py", line 768, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
  0%|                                                                                                                  | 0/120 [00:27<?, ?it/s]

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Sep 13, 2024
@hiyouga hiyouga added wontfix This will not be worked on and removed pending This problem is yet to be addressed labels Nov 2, 2024
@hiyouga hiyouga closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants