Trainer use one epoch of the test_dataloader

BttMA · October 25, 2021, 8:21am

Hello,
I want to try the test mode with trainer.test(model, data_module, ckpt_path="./checkpoints/best-checkpoint.ckpt"). If I did not make any mistake, with that line of cod, the model will work with the dataset_test that I created in the LightningDataModule and run each epoch, calculate the loss and at the end it will return el avg_loss (code below).

class Data(pl.LightningDataModule): #data_module
  def __init__(self, test_df):
    super().__init__()
    self.test_df = test_df
    #...

  def setup(self, stage=None):
    #...
    self.dataset_test = TensorDataset(input_ids_test, attention_masks_test, labels_test)

    def test_dataloader(self):
    return DataLoader(self.dataset_test,
                      batch_size=self.batch_size,
                      num_workers=8)

class ClassModel(pl.LightningModule): #model
  def __init__(self):
    super().__init__()
    self.model = bert_model
    #...

  def configure_optimizers(self):
   #...

  def forward(self, input_ids, attention_mask, labels):
    loss, output = self.model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
    return loss, output

  #...

  def test_step(self, batch, batch_idx):
    inputs = {'input_ids':      batch[0],
              'attention_mask': batch[1],
              'labels':         batch[2]}
    outputs = self.model(**inputs)
    loss = outputs[0]
    self.log("Test loss", loss, on_epoch=True)
    return {'loss': loss}

  def test_epoch_end(self, outputs):
    avg_loss = torch.stack([x['loss'] for x in outputs]).mean()
    return {'avg_loss': avg_loss}

The result that I got after I run trainer.test(model, data_module) is :

We can see her that it only run for one epoch (63) while the size of my test dataset is 1000 and the batch size is 16.
Can you please tell me here if I missed something?
Thank you for your time!

goku · January 29, 2022, 9:35pm

hey @BttMA

max_epochs doesn’t have any effect on testing/validation/prediction. It just iterates over the complete dataset only once. Since there is no optimization happening here, running .test multiple times won’t provide any significance. I hope it answers your question.

Also, we have moved the discussions to GitHub Discussions. You might want to check that out instead to get a quick response. The forums will be marked read-only after some time.

Thank you

BttMA · January 31, 2022, 9:17am

Hello Thank you for your answer. It has been a while since I posted this qu. and I think I didn’t make myself very clear as I mixed up between epoch and batch (sorry!!). what I wanted to test is all the test dataset not only one batch of it. I saw in the doc that the trainer.test run only over one epoch but you didn’t mention the batch size. The snippet of my code show that the test run only over one batch and not over all the test dataset. I hope this will be more clear. Thanks

goku · February 1, 2022, 10:03pm

this looks good to me. since your total dataset size is 1000 and batch_size is 16, total batches = 1000/16 ~= 63, and it shows 63 as the final counter.

BttMA · February 2, 2022, 8:15am

it looks fine to me now
Maybe, back then, I misread my data.
Thanks a lot