How to get step-wise validation loss curve over all epochs

Hi everybody,

is there a way to get the step-wise validation loss curve over all epochs? When setting on_step=True in self.log() inside validation_step() I get a step-wise loss curve for each epoch. This becomes very messy for trainings over thousands of epochs.

From the equally named SO thread (https://stackoverflow.com/questions/66290662/how-to-get-step-wise-validation-loss-over-all-epochs-in-pytorch-lightning):

When logging my validation loss inside validation_step() in PyTorch Lighnting like this:

def validation_step(self, batch: Tuple[Tensor, Tensor], _batch_index: int) -> None:
    inputs_batch, labels_batch = batch

    outputs_batch = self(inputs_batch)
    loss = self.criterion(outputs_batch, labels_batch)

    self.log('loss (valid)', loss.item())

Then, I get an epoch-wise loss curve:

~ deleted image because I can reference only a single image as a new user ~

If I want the step-wise loss curve I can set on_step=True:

def validation_step(self, batch: Tuple[Tensor, Tensor], _batch_index: int) -> None:
    inputs_batch, labels_batch = batch

    outputs_batch = self(inputs_batch)
    loss = self.criterion(outputs_batch, labels_batch)

    self.log('loss', loss.item(), on_step=True)

This results in step-wise loss curves for each epoch:

enter image description here

How can I get a single graph over all epochs instead? When running my training for thousands of epochs this gets messy.

Thanks in advance!

Solved on StackOverflow [1]. The solution is to use self.logger.experiment instead of self.log().

[1] https://stackoverflow.com/questions/66290662/how-to-get-step-wise-validation-loss-over-all-epochs-in-pytorch-lightning

1 Like