Understanding logging and validation_step, validation_epoch_end

I have hard to understand how to use return in validation_step, validation_epoch_end (well this also goes for train and test).

First of all, when do I want to use validation_epoch_end? I have seen some not using it at all.

Second, I do not understand how the logging works and how to use it, eg

def training_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self.forward(x)
    loss = F.cross_entropy(y_hat, y)        
    return {'loss': loss, 'log': loss}   

def validation_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self.forward(x)
    loss = F.cross_entropy(y_hat, y)
    return {'val_loss': loss}

def validation_epoch_end(self, outputs):
    avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
    log = {'val_loss': avg_loss}
    return {'val_loss': avg_loss, 'log': log}

Where does log go? I understand the return of ‘loss’, but I don’t understand where ‘log’ goes and how to use it.

Third, what I understand there is a new way to use log by writing self.log. I get warnings by not using this. So what is the difference?

The new .log functionality works similar to how it did when it was in the dictionary, however we now automatically aggregate the things you log each step and log the mean each epoch if you specify so. For example the code you wrote above can be re-written as:

def training_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self.forward(x)
    loss = F.cross_entropy(y_hat, y)
    self.log("loss", loss)        
    return loss

def validation_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self.forward(x)
    loss = F.cross_entropy(y_hat, y)
    # on_epoch=True by default in `validation_step`,
    # so it is not necessary to specify
    self.log("val_loss", on_epoch=True) 

This eliminates the need for validation_step_end. If for some reason you still wanted to do this aggregation yourself, you could also do:

def training_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self.forward(x)
    loss = F.cross_entropy(y_hat, y)
    self.log("loss", loss)        
    return loss

def validation_step(self, batch, batch_idx):
    x, y = batch
    y_hat = self.forward(x)
    loss = F.cross_entropy(y_hat, y)
    return loss

def validation_epoch_end(self, batch, outs):
    # outs is a list of whatever you returned in `validation_step`
    loss = torch.stack(outs).mean()
    self.log("val_loss", loss)

Which functions equivalently. Hope this clears things up! :slight_smile:

What about
checkpoint_callback = ModelCheckpoint(monitor=‘val_loss’)
?
Does this pull val_loss from the call to self.log(“val_loss”)?
How does on_epoch=True impact ModelCheckpoint?