Synchronize train logging

Cheng_Young · October 16, 2021, 1:13pm

as what is said in the above picture, validation and test log can use sync_dist=True.

I wonder whether there is a solution for training to synchoronize? like the following code, I run in 8 gpus, I want 8 gpus’s train_loss and train_acc to be averaged :

    def training_step(self, batch, batch_idx):

        inputs = self.train_inputs(batch)
        loss, logits = self(**inputs)

        mask = (batch['labels'] != 5).long()
        ntotal = mask.sum()
        ncorrect = ((logits.argmax(dim=-1) == batch['labels']).long() *
                    mask).sum()
        acc = ncorrect / ntotal

        self.log('train_loss', loss, on_step=True, prog_bar=True,sync_dist =True)
        self.log("train_acc", acc, on_step=True, prog_bar=True,sync_dist= True)

        return loss

goku · January 29, 2022, 9:29pm

hey @Cheng_Young

the same holds for training as well.

Also, we have moved the discussions to GitHub Discussions. You might want to check that out instead to get a quick response. The forums will be marked read-only after some time.

Thank you