You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please check that this issue hasn't been reported before.
I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
I'm fine-tuning on a single GPU (no deepspeed configuration). I applied max_grad_norm: 1 and I would expect the normalized gradients to be clipped at 1 and never exceed.
Current behaviour
I'm seeing large gradient spikes on Weights & Biases?
Steps to reproduce
Train a qlora with axolotl with max_grad_norm: 1 and see the gradients be >1.
Config yaml
# -----------------------------------# ---- Base Model Configuration -----# -----------------------------------base_model: meta-llama/Meta-Llama-3.1-8Bmodel_type: AutoModelForCausalLMtokenizer_type: AutoTokenizerload_in_8bit: falseload_in_4bit: truestrict: falsechat_template: llama3# -----------------------------------# ------------ Dataset -------------# -----------------------------------datasets:
# - path: databricks/databricks-dolly-15k
- path: /home/ubuntu/kindo-base/notebooks/truncated_dolly_15Kds_type: jsontype:
system_prompt: ""field_system: systemfield_instruction: instructionfield_input: contextfield_output: responseformat: "[INST] {instruction} {input} [/INST]"no_input_format: "[INST] {instruction} [/INST]"train_split: traindataset_prepared_path: last_run_prepared# How much to set out across all datasetsval_set_size: .05output_dir: ./outputs/qlora-out# -----------------------------------# ----------- Lora Config -----------# -----------------------------------adapter: qloralora_model_dir:
lora_r: 128lora_alpha: 32# alpha = r/4 is in the qlora paperlora_dropout: 0.05lora_target_modules:
lora_target_linear: truelora_fan_in_fan_out:
lora_modules_to_save:
- embed_tokens
- lm_head# -----------------------------------# ------- Training parameters -------# -----------------------------------sequence_len: 4096sample_packing: falsepad_to_sequence_len: truegradient_accumulation_steps: 8micro_batch_size: 8num_epochs: 2optimizer: paged_adamw_8bitmax_grad_norm: 1.0# Learning Ratelr_scheduler: cosinelearning_rate: 0.0004warmup_ratio: 0.05train_on_inputs: falsegroup_by_length: falsebf16: autofp16:
tf32: falsegradient_checkpointing: trueearly_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1xformers_attention:
flash_attention: trueevals_per_epoch: 3eval_batch_size: 2eval_table_size:
saves_per_epoch: 1debug:
deepspeed:
weight_decay: 0.0fsdp:
fsdp_config:
special_tokens:
pad_token: "<|end_of_text|>"bos_token: "<s>"eos_token: "</s>"unk_token: "<unk>"tokens: # these are delimiters
- "<|im_start|>"
- "<|im_end|>"# -----------------------------------# ------- Liger Integration ---------# -----------------------------------plugins:
- axolotl.integrations.liger.LigerPluginliger_rope: trueliger_rms_norm: trueliger_glu_activation: trueliger_layer_norm: trueliger_fused_linear_cross_entropy: true
Possible solution
I'm wondering if the metrics reported to Weights & Biases are before clipping?
The gradients are really, really high. Is it possible the way I'm mapping the dataset to the expected format for llama is wrong?
Which Operating Systems are you using?
Linux
macOS
Windows
Python Version
3.10.13
axolotl branch-commit
main
Acknowledgements
My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.
The text was updated successfully, but these errors were encountered:
Please check that this issue hasn't been reported before.
Expected Behavior
I'm fine-tuning on a single GPU (no deepspeed configuration). I applied
max_grad_norm: 1
and I would expect the normalized gradients to be clipped at 1 and never exceed.Current behaviour
I'm seeing large gradient spikes on Weights & Biases?
Steps to reproduce
Train a qlora with axolotl with
max_grad_norm: 1
and see the gradients be >1.Config yaml
Possible solution
Which Operating Systems are you using?
Python Version
3.10.13
axolotl branch-commit
main
Acknowledgements
The text was updated successfully, but these errors were encountered: