Custom `trainer.test`

I’m trying to understand the trainer.test function in Pytorch Lightning. Right now I can do

trainer.test(model=model, datamodule=datamodule)

image_AUROC               1.0
image_BinaryAccuracy      0.923
...

For a batch of samples in datamodule, it computes the metric scores for all samples and return a mean value. But I want to get score of each sample separeately. Is it possible out-of the box from API?

I am thinking to loop over the datamodule and call model on each sample and use model.metric to compute the evaluation. But I am not certain how model is programmed to behave inside trainer.test function. I went through the src code but not easy to figure it out. For example, should I use model.eval or use with torch.no_grad context?

model.eval()
with torch.no_grad():
    for batch in datamodule.test_dataloader():
       map2d, logit = model(batch['image']) # [1, 1, 224, 224]
       eval_metrics = model.image_metrics(batch['gt'], logit)
       ....

Hey @Mohammed_Innat, you can customize the test_step method to record the score for each of the samples.

Here is a pseudo-code (you might want to run this with a batch size of 1) :point_down:

def test_step(self, batch, batch_idx):
    x, y = batch

    out = self(x)

    # calculate ACC (update this to your score definition)
    labels_hat = torch.argmax(out, dim=1)
    test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)

    # log the outputs!
    self.test_scores[batch_idx] = test_acc