PearsonCorrcoef always zero when validation batch size is one. #286

ramonemiliani93 · 2021-06-01T21:44:48Z

ramonemiliani93
Jun 1, 2021

🐛 Bug

Hi, rather than a bug this is a question about how the PearsonCorrcoef operates. When I have an element with batch size of one on the validation set the correlation will always be zero. To my understanding, if the batch size is one on the validation set and I have 10 elements to validate with it should concatenate the 10 elements and calculate the correlation with that.

To Reproduce

import torch
from pytorch_lightning import LightningModule, LightningDataModule, Trainer
from pytorch_lightning.loggers import TensorBoardLogger
from torch.utils.data import Dataset
from torchmetrics import PearsonCorrcoef


class RandomDataset(Dataset):

    def __init__(self, length: int):
        self.length = length
        self.data = torch.linspace(0, 1, length, dtype=torch.float)

    def __getitem__(self, index):
        value = self.data[index].reshape(-1)
        return value + torch.rand(1), value

    def __len__(self):
        return self.length


class RangeDataModule(LightningDataModule):
    def setup(self, stage=None):
        pass

    def train_dataloader(self):
        return torch.utils.data.DataLoader(RandomDataset(100), batch_size=10, shuffle=True)

    def val_dataloader(self):
        return torch.utils.data.DataLoader(RandomDataset(50), batch_size=1)

    def test_dataloader(self):
        return torch.utils.data.DataLoader(RandomDataset(100), batch_size=1)


class SuperBoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(1, 1, bias=False)
        self.layer.weight.data.fill_(0.5)
        self.train_correlation = PearsonCorrcoef()
        self.val_correlation = PearsonCorrcoef()

    def forward(self, x):
        return self.layer(x)

    @staticmethod
    def loss(y_pred, y_true):
        return torch.nn.functional.mse_loss(y_pred, y_true)

    def training_step(self, batch, batch_idx):
        y_pred, y_true, loss = self.common_step(batch, batch_idx)
        self.log_dict({"train_correlation": self.train_correlation(y_pred, y_true)})
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        y_pred, y_true, loss = self.common_step(batch, batch_idx)
        self.log_dict({"val_correlation": self.val_correlation(y_pred, y_true)})
        return {"x": loss}

    def common_step(self, batch, batch_idx):
        x, y_true = batch
        y_pred = self.layer(x)
        loss = self.loss(y_pred, y_true)
        return y_pred, y_true, loss

    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.layer.parameters(), lr=0.0001)
        return optimizer


if __name__ == '__main__':
    model = SuperBoringModel()
    datamodule = RangeDataModule()

    trainer = Trainer(logger=TensorBoardLogger("."))
    trainer.fit(model, datamodule=datamodule)

Steps to reproduce the behavior:

Copy and paste code.
Run file.
Check generated tensorboard metrics.

Environment

PyTorch Version (e.g., 1.0): 1.8.1
OS (e.g., Linux): MacOSS
How you installed PyTorch (conda, pip, source): conda
Python version: 3.7
PyTorch Lightning: 1.4.0dev
Torchmetrics: 0.4.0dev

Answered by SkafteNicki

Jun 3, 2021

When you call
self.val_correlation(y_pred, y_true)
you are calculating the correlation on that batch. If the batch size is only 1 then the correlation will always be 0. What you of cause want to do is to calculate the correlation over the hole validation set as pearsons is a global metric. You should therefore do something like this:

def validation_step(self, batch, batch_idx):
    y_pred, y_true, loss = self.common_step(batch, batch_idx)
    self.val_correlation.update(y_pred, y_true)

def validation_epoch_end(self, outputs):
    corr = self.val_correlation.compute()
    self.log_dict({"val_correlation": corr})

View full answer

SkafteNicki · 2021-06-03T12:37:40Z

SkafteNicki
Jun 3, 2021
Maintainer

When you call
self.val_correlation(y_pred, y_true)
you are calculating the correlation on that batch. If the batch size is only 1 then the correlation will always be 0. What you of cause want to do is to calculate the correlation over the hole validation set as pearsons is a global metric. You should therefore do something like this:

def validation_step(self, batch, batch_idx):
    y_pred, y_true, loss = self.common_step(batch, batch_idx)
    self.val_correlation.update(y_pred, y_true)

def validation_epoch_end(self, outputs):
    corr = self.val_correlation.compute()
    self.log_dict({"val_correlation": corr})

0 replies

ramonemiliani93 · 2021-06-03T13:25:03Z

ramonemiliani93
Jun 3, 2021
Author

Thanks @SkafteNicki! Could I help contributing a running version of the algorithm?

0 replies

Borda · 2021-06-08T19:52:23Z

Borda
Jun 8, 2021
Maintainer

contributing a running version of the algorithm?

hi there, not sure what you mean? 🐰

0 replies

ramonemiliani93 · 2021-06-08T20:08:01Z

ramonemiliani93
Jun 8, 2021
Author

Hi @Borda! Something like what is suggested here. We would keep track of certain variables per batch and output the correlation at the end of the loop instead of appending to a list.

0 replies

Borda · 2021-06-08T20:49:41Z

Borda
Jun 8, 2021
Maintainer

I see, YES, that would be very welcome contribution :]
cc: @PyTorchLightning/core-metrics

1 reply

edgarriba Jun 9, 2021

@ramonemiliani93 for your suggestion - please open a new ticket as a feature request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PearsonCorrcoef always zero when validation batch size is one. #286

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

PearsonCorrcoef always zero when validation batch size is one. #286

ramonemiliani93 Jun 1, 2021

🐛 Bug

To Reproduce

Environment

Replies: 5 comments · 1 reply

SkafteNicki Jun 3, 2021 Maintainer

ramonemiliani93 Jun 3, 2021 Author

Borda Jun 8, 2021 Maintainer

ramonemiliani93 Jun 8, 2021 Author

Borda Jun 8, 2021 Maintainer

edgarriba Jun 9, 2021

ramonemiliani93
Jun 1, 2021

Replies: 5 comments 1 reply

SkafteNicki
Jun 3, 2021
Maintainer

ramonemiliani93
Jun 3, 2021
Author

Borda
Jun 8, 2021
Maintainer

ramonemiliani93
Jun 8, 2021
Author

Borda
Jun 8, 2021
Maintainer