-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lookahead - RuntimeError: Expected all tensors to be on the same device #306
Comments
@atonyo11 hi. could you please share a specific example to reproduce? It'd be good to fix the code based on your usage. I checked the implementation and tested it with the below example, but I can't reproduce it. It seems like the params of the Adam optimizer are in the GPU, but the params of Lookadhead aren't. I may be wrong, but I assume you might load your optimizer states on a different device or something similar.
|
@kozistr Thank you for your quick reply. I am doing this work. |
hi. Could you explain in more detail how to reproduce? I tested various scenarios as far as I could, but still have issues reproducing the device mismatch issue by loading from the checkpoint or calling the optimizer in and of itself. (I might miss something.) However, I found that could possibly happen when you continue your training after trying to load the optimizer states (both Adam and Lookahead) through that repo you mentioned, Lookahead's state is still in on the CPU, because currently, the state is not saved and loaded, and its device is determined only when initialing the Lookahead optimizer. In short, I just made a modification that can also save and load the Lookahead optimizer state, and all you need to do is to save and load the optimizer state like below. optimizer = ...
torch.save(optimizer.state_dict(), 'opt.ckpt')
optimizer.load_state_dict(torch.load('opt.ckpt', map_location='cuda')) you can check the modified implementation here. hope this could help with your issue and please let me know if you still have a problem |
I just run program from start, no load pretrain |
Describe the bug
I can run my program OK with optim.Adam. After wrap optimizer by Lookahead, errors were shown
To Reproduce
self.optimizer = Lookahead(optim.Adam( model.parameters(), lr=self.optim_dict['base_lr'], weight_decay=self.optim_dict['weight_decay'] ), k=5, alpha=0.5)
Log
scaler.step(optimizer.optimizer) File "/private/.conda/envs/project1/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 374, in step retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs) File "/private/.conda/envs/project1/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step retval = optimizer.step(*args, **kwargs) File "/private/.conda/envs/project1/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper return wrapped(*args, **kwargs) File "/private/.conda/envs/project1/lib/python3.9/site-packages/pytorch_optimizer/optimizer/lookahead.py", line 137, in step self.update(group) File "/private/.conda/envs/project1/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/private/.conda/envs/project1/lib/python3.9/site-packages/pytorch_optimizer/optimizer/lookahead.py", line 116, in update p.mul_(self.alpha).add_(slow, alpha=1.0 - self.alpha) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: