How to customize trainer.test

I am working on a classification problem.

Currently, if I run trainer.test, it gives me nothing printed out even if verbose == True. I am not sure what functions test call to run.

I want to have accuracy, mean accuracy over each class, and a confusion matrix for my test set. I know how to implement these functions. However, I am not sure where should I implement them.

Hey, have a look at the metrics API: https://pytorch-lightning.readthedocs.io/en/latest/metrics.html

These results can be given automatically if you log them using the self.log function as explained above! Also supported is generating a confusion matrix: https://pytorch-lightning.readthedocs.io/en/latest/metrics.html#confusion-matrix-func

If implemented in the validation/test step these are auto-aggregated for you at the end of an epoch. Let me know if this helps!

Many thanks. I am currently using these kind of APIs, like accuracy in my train and validation.

However, it is confused to me, where should I implement them for testing?

Dear xinqi,

Thanks for using Pytorch Lightning !

Would you mind sharing a sample of your code ?

I would say you could do something like that

def __init__(self):
    ...
    self.train_acc = pl.metrics.Accuracy()
    self.valid_acc = pl.metrics.Accuracy()

def training_step(self, batch, batch_idx):
    logits = self(x)
    ...
    self.train_acc(logits, y)
    self.log('train_acc', self.train_acc, on_step=True, on_epoch=False)

def validation_step(self, batch, batch_idx):
    logits = self(x)
    ...
    self.valid_acc(logits, y)
    self.log('valid_acc', self.valid_acc, on_step=True, on_epoch=True)

Thank you so much.

Please let me re-describe my question in another way. If we call trainer.test(model), I will test my model on my test set. I am not sure what are the functions it called, and where should I implement the related customized test.

For your advice, you have train and validation. I will attach the piece of code below. Please confirm if you are seeing test and validation calls the exact same functions.

def __init__(self,
             encoder_model,
             batch_size,
             num_samples,
             warmup_epochs=10,
             lr=1e-4,
             opt_weight_decay=1e-6,
             loss_temperature=0.5,
             **kwargs):
   
    super().__init__()
    self.encoder_model = encoder_model
    self.save_hyperparameters()
    self.entropy_loss = nn.CrossEntropyLoss()
    self.nt_xent_loss = nt_xent_loss
    self.encoder = self.init_encoder()

    # h -> || -> z
    self.projection = Projection(output_dim=7)
    #nn.Linear(self.hidden_dim, 7, bias=False)

    # Accuracy
    self.acc = pl.metrics.Accuracy()
def training_step(self, batch, batch_idx):
    loss, acc = self.shared_step(batch, batch_idx)

    self.log('train_loss', loss, on_epoch=True)
    self.log('train_acc', acc, on_epoch=True)
    return loss

def validation_step(self, batch, batch_idx):
    loss, acc = self.shared_step(batch, batch_idx)

    self.log('avg_val_loss', loss)
    self.log('avg_val_acc', acc)
    return loss

 def shared_step(self, batch, batch_idx):
    (img1, img2), y = batch

    # ENCODE
    # encode -> representations
    # (b, 3, 32, 32) -> (b, 2048, 2, 2)
    h1 = self.encoder_model(img1)
    #h2 = self.encoder(img2)

    # the bolts resnets return a list of feature maps
    if isinstance(h1, list):
        h1 = h1[-1]
        #h2 = h2[-1]

    # PROJECT
    # img -> E -> h -> || -> z
    # (b, 2048, 2, 2) -> (b, 128)
    z1 = self.projection(h1)

    loss = self.entropy_loss(z1, y.cuda())
    acc = self.acc(z1, y.cuda())

    return loss, acc

You need to define test_step + test_dataloader for trainer.test() to take effect

1 Like

Thanks. This is the spot answer