Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values #58

akjindal53244 · 2024-07-23T14:40:19Z

Hi GaLore Team, congratulations for the interesting work!

I am trying to fine-tune llama-3 8B model using GaLore but getting this error:
torch._C._LinAlgError: linalg.svd: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values.

Interestingly first batch loss is non-zero and all subsequent losses are zero values before training is automatically terminated.

Full Error Log

Activated GaLoRE fine-tuning, depending on your model size and hardware, the training might take a while before starting. Please be patient !
model.layers.0.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.0.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.0.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.0.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.1.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.2.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.3.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.4.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.5.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.6.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.7.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.8.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.9.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.10.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.11.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.12.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.13.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.14.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.15.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.16.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.16.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.16.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.16.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.17.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.17.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.17.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.17.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.18.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.18.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.18.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.18.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.19.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.19.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.19.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.19.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.20.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.20.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.20.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.20.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.21.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.21.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.21.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.21.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.22.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.22.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.22.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.22.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.23.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.23.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.23.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.23.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.24.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.24.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.24.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.24.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.25.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.25.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.25.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.25.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.26.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.26.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.26.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.26.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.27.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.27.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.27.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.27.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.28.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.28.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.28.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.28.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.29.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.29.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.29.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.29.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.30.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.30.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.30.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.30.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.31.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.31.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.31.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
model.layers.31.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
  0%|                                                                                                                                                                            | 0/6719320 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  0%|                                                                                                                                                            | 1/6719320 [06:15<701609:55:03, 375.90s/it][2024-07-23 07:19:54,094] [INFO] [axolotl.callbacks.on_step_end:128] [PID:148509] [RANK:0] GPU memory usage while training: 17.607GB (+15.215GB cache, +1.482GB misc)
{'loss': 1.694, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                      
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
{'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
  0%|                                                                                                                                                             | 200/6719320 [08:51<1455:09:36,  1.28it/s]/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/galore_projector.py:83: UserWarning: torch.linalg.svd: During SVD computation with the selected cusolver driver, batches 0 failed to converge. A more accurate method will be used to compute the SVD as a fallback. Check doc at https://pytorch.org/docs/stable/generated/torch.linalg.svd.html (Triggered internally at ../aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp:697.)
  U, s, Vh = torch.linalg.svd(matrix, full_matrices = False)
Traceback (most recent call last):
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/cli/train.py", line 72, in <module>
    fire.Fire(do_cli)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/cli/train.py", line 39, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/cli/train.py", line 67, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/train.py", line 191, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train
    return inner_training_loop(
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 3324, in training_step
    self.accelerator.backward(loss, **kwargs)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/accelerator.py", line 2151, in backward
    loss.backward(**kwargs)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
    torch.autograd.backward(
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
    _engine_run_backward(
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 1398, in optimizer_hook
    optimizer_dict[param].step()
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 75, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/optim/optimizer.py", line 391, in wrapper
    out = func(*args, **kwargs)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/adamw8bit.py", line 58, in step
    grad = state["projector"].project(p.grad, state["step"])
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/galore_projector.py", line 21, in project
    self.ortho_matrix = self.get_orthogonal_matrix(full_rank_grad, self.rank, type='left')
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/galore_projector.py", line 83, in get_orthogonal_matrix
    U, s, Vh = torch.linalg.svd(matrix, full_matrices = False)
torch._C._LinAlgError: linalg.svd: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 1023).
  0%|                                                                                                                                                             | 200/6719320 [13:19<7465:36:33,  4.00s/it]
Traceback (most recent call last):
  File "/home/minimalist/miniconda3/envs/comps/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
    simple_launcher(args)
  File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/commands/launch.py", line 703, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/minimalist/miniconda3/envs/comps/bin/python', '-m', 'axolotl.cli.train', 'examples/llama-3/qlora.yml']' returned non-zero exit status 1.

Hyperparams

base_model: meta-llama/Meta-Llama-3-8B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

datasets:
  - path: <dataset>
    type: sharegpt
    conversation: llama-3
    field_human: human
    field_model: gpt

dataset_prepared_path:
val_set_size: 0.01
output_dir: ./outputs/galore-out

sequence_len: 2048
sample_packing: false
eval_sample_packing: true
pad_to_sequence_len: true

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 4
optimizer: galore_adamw_8bit_layerwise
lr_scheduler: cosine
learning_rate: 0.000001

optim_target_modules:
 - self_attn
 - mlp

train_on_inputs: false
group_by_length: false
bf16: true
tf32: false

bfloat16: true

logging_steps: 4
flash_attention: true

The text was updated successfully, but these errors were encountered:

BaohaoLiao · 2024-09-01T18:39:40Z

You can always get the normal loss for the first batch, because the SVD is calculated based on the gradients from the loss.

is it possible for you to offer the reproduction code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values #58

Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values #58

akjindal53244 commented Jul 23, 2024

Full Error Log

Hyperparams

BaohaoLiao commented Sep 1, 2024 •

edited

Loading

Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values #58

Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values #58

Comments

akjindal53244 commented Jul 23, 2024

Full Error Log

Hyperparams

BaohaoLiao commented Sep 1, 2024 • edited Loading

BaohaoLiao commented Sep 1, 2024 •

edited

Loading