You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying adafactor but I get the following issues:
args.scheduler=None
--------------------- META-TRAIN ------------------------
Starting training!
Traceback (most recent call last):
File "/home/miranda9/automl-meta-learning/automl-proj-src/experiments/meta_learning/main_metalearning.py", line 441, in <module>
main_resume_from_checkpoint(args)
File "/home/miranda9/automl-meta-learning/automl-proj-src/experiments/meta_learning/main_metalearning.py", line 403, in main_resume_from_checkpoint
run_training(args)
File "/home/miranda9/automl-meta-learning/automl-proj-src/experiments/meta_learning/main_metalearning.py", line 413, in run_training
meta_train_fixed_iterations(args)
File "/home/miranda9/automl-meta-learning/automl-proj-src/meta_learning/training/meta_training.py", line 233, in meta_train_fixed_iterations
args.outer_opt.step()
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch_optimizer/adafactor.py", line 191, in step
self._approx_sq_grad(
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch_optimizer/adafactor.py", line 116, in _approx_sq_grad
(exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1))
RuntimeError: The size of tensor a (3) must match the size of tensor b (64) at non-singleton dimension 1
with the pytorch default adam training runs so why does this one fail?
I had a look at this error which I also faced when training a ResNet-50 model. I got a similar error as @brando90, except that the dimensions of my tensors were different. Please read further in order to understand how I managed to fix this issue.
First of all, the exception is raised from here, where the tensor exp_avg_sq_row is divided by the mean over the last dimension. In my case, exp_avg_sq_row has size [64, 3, 7]. When computing the mean over the last dimension, the result exp_avg_sq_row.mean(dim=-1) will have size [64, 3] and the dimension mismatch for this division operation raises the RuntimeError.
The solution is to unsqueeze the mean tensor such that instead of doing (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1)), we should do (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1).unsqueeze(-1)).
I was trying adafactor but I get the following issues:
with the pytorch default adam training runs so why does this one fail?
related:
The text was updated successfully, but these errors were encountered: