Synchronize train logging

as what is said in the above picture, validation and test log can use sync_dist=True.

I wonder whether there is a solution for training to synchoronize? like the following code, I run in 8 gpus, I want 8 gpus’s train_loss and train_acc to be averaged :

    def training_step(self, batch, batch_idx):

        inputs = self.train_inputs(batch)
        loss, logits = self(**inputs)

        mask = (batch['labels'] != 5).long()
        ntotal = mask.sum()
        ncorrect = ((logits.argmax(dim=-1) == batch['labels']).long() *
        acc = ncorrect / ntotal

        self.log('train_loss', loss, on_step=True, prog_bar=True,sync_dist =True)
        self.log("train_acc", acc, on_step=True, prog_bar=True,sync_dist= True)

        return loss