How to get step-wise validation loss curve over all epochs

TobiasUhmann · February 20, 2021, 10:55am

Hi everybody,

is there a way to get the step-wise validation loss curve over all epochs? When setting on_step=True in self.log() inside validation_step() I get a step-wise loss curve for each epoch. This becomes very messy for trainings over thousands of epochs.

From the equally named SO thread (How to get step-wise validation loss curve over all epochs in PyTorch Lightning - Stack Overflow):

When logging my validation loss inside validation_step() in PyTorch Lighnting like this:

def validation_step(self, batch: Tuple[Tensor, Tensor], _batch_index: int) -> None:
    inputs_batch, labels_batch = batch

    outputs_batch = self(inputs_batch)
    loss = self.criterion(outputs_batch, labels_batch)

    self.log('loss (valid)', loss.item())

Then, I get an epoch-wise loss curve:

~ deleted image because I can reference only a single image as a new user ~

If I want the step-wise loss curve I can set on_step=True:

def validation_step(self, batch: Tuple[Tensor, Tensor], _batch_index: int) -> None:
    inputs_batch, labels_batch = batch

    outputs_batch = self(inputs_batch)
    loss = self.criterion(outputs_batch, labels_batch)

    self.log('loss', loss.item(), on_step=True)

This results in step-wise loss curves for each epoch:

How can I get a single graph over all epochs instead? When running my training for thousands of epochs this gets messy.

Thanks in advance!

TobiasUhmann · February 21, 2021, 9:08pm

Solved on StackOverflow [1]. The solution is to use self.logger.experiment instead of self.log().

[1] How to get step-wise validation loss curve over all epochs in PyTorch Lightning - Stack Overflow