Lookahead - RuntimeError: Expected all tensors to be on the same device #306

atonyo11 · 2024-12-06T10:42:33Z

Describe the bug

I can run my program OK with optim.Adam. After wrap optimizer by Lookahead, errors were shown

To Reproduce

OS :Linux
PyTorch version : 2.0.1
Python version : 3.9
reproducible codes :
self.optimizer = Lookahead(optim.Adam( model.parameters(), lr=self.optim_dict['base_lr'], weight_decay=self.optim_dict['weight_decay'] ), k=5, alpha=0.5)

Log

scaler.step(optimizer.optimizer) File "/private/.conda/envs/project1/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 374, in step retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs) File "/private/.conda/envs/project1/lib/python3.9/site-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step retval = optimizer.step(*args, **kwargs) File "/private/.conda/envs/project1/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper return wrapped(*args, **kwargs) File "/private/.conda/envs/project1/lib/python3.9/site-packages/pytorch_optimizer/optimizer/lookahead.py", line 137, in step self.update(group) File "/private/.conda/envs/project1/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/private/.conda/envs/project1/lib/python3.9/site-packages/pytorch_optimizer/optimizer/lookahead.py", line 116, in update p.mul_(self.alpha).add_(slow, alpha=1.0 - self.alpha) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

kozistr · 2024-12-06T11:15:41Z

@atonyo11 hi. could you please share a specific example to reproduce? It'd be good to fix the code based on your usage. I checked the implementation and tested it with the below example, but I can't reproduce it.

It seems like the params of the Adam optimizer are in the GPU, but the params of Lookadhead aren't. I may be wrong, but I assume you might load your optimizer states on a different device or something similar.

import os

import torch
from torch import nn
from torch.nn import functional as F
from torch import optim, nn, utils, Tensor

from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor

import lightning.pytorch as pl

from pytorch_optimizer import load_optimizer, Lookahead
from torch.optim import Optimizer, Adam


class LitAutoEncoder(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(nn.Linear(28 * 28, 64), nn.ReLU(), nn.Linear(64, 3))
        self.decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 28 * 28))

    def training_step(self, batch, batch_idx):
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        return Lookahead(Adam(self.parameters(), lr=1e-3), k=5, alpha=0.5)

train_dataset = MNIST(os.getcwd(), train=True, download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(train_dataset)

autoencoder = LitAutoEncoder()
autoencoder.train()
autoencoder.cuda()

trainer = pl.Trainer(
    limit_train_batches=100,
    max_epochs=1,
    accelerator='auto',
    logger=True,
)

trainer.fit(autoencoder, train_loader, valid_loader)

atonyo11 · 2024-12-06T11:18:54Z

@kozistr Thank you for your quick reply.

I am doing this work.
https://github.com/hulianyuyy/CorrNet/blob/main/utils/optimizer.py

kozistr · 2024-12-08T05:04:27Z

@kozistr Thank you for your quick reply.

I am doing this work. https://github.com/hulianyuyy/CorrNet/blob/main/utils/optimizer.py

hi. Could you explain in more detail how to reproduce? I tested various scenarios as far as I could, but still have issues reproducing the device mismatch issue by loading from the checkpoint or calling the optimizer in and of itself. (I might miss something.)

However, I found that could possibly happen when you continue your training after trying to load the optimizer states (both Adam and Lookahead) through that repo you mentioned, Lookahead's state is still in on the CPU, because currently, the state is not saved and loaded, and its device is determined only when initialing the Lookahead optimizer.

In short, I just made a modification that can also save and load the Lookahead optimizer state, and all you need to do is to save and load the optimizer state like below.

optimizer = ...

torch.save(optimizer.state_dict(), 'opt.ckpt')
optimizer.load_state_dict(torch.load('opt.ckpt', map_location='cuda'))

you can check the modified implementation here.

hope this could help with your issue and please let me know if you still have a problem

atonyo11 · 2024-12-08T14:09:58Z

I just run program from start, no load pretrain
python main.py --config ./config/baseline.yaml --device 0

atonyo11 added the bug Something isn't working label Dec 6, 2024

atonyo11 assigned kozistr Dec 6, 2024

kozistr mentioned this issue Dec 8, 2024

[Fix] Save and load the Lookahead optimizer's state #310

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lookahead - RuntimeError: Expected all tensors to be on the same device #306

Lookahead - RuntimeError: Expected all tensors to be on the same device #306

atonyo11 commented Dec 6, 2024

kozistr commented Dec 6, 2024 •

edited

Loading

atonyo11 commented Dec 6, 2024

kozistr commented Dec 8, 2024 •

edited

Loading

atonyo11 commented Dec 8, 2024

Lookahead - RuntimeError: Expected all tensors to be on the same device #306

Lookahead - RuntimeError: Expected all tensors to be on the same device #306

Comments

atonyo11 commented Dec 6, 2024

Describe the bug

To Reproduce

Log

Expected behavior

Additional context

kozistr commented Dec 6, 2024 • edited Loading

atonyo11 commented Dec 6, 2024

kozistr commented Dec 8, 2024 • edited Loading

atonyo11 commented Dec 8, 2024

kozistr commented Dec 6, 2024 •

edited

Loading

kozistr commented Dec 8, 2024 •

edited

Loading