Correct approach to calculate metrics in DDP setting

In the case of DDP:

  • The metrics should be calculated in validation_step or the metrics should be calculated at validation_step_end after gathering output tensors returned by validation_step?
    • If the metrics are calculated in validation_step, would be it correct to take the mean of the corresponding metrics in validation_step_end? Considering batch partitions for each device can be uneven?
    • Does calling all_gather on the output tensors inside validation_step_end adds an extra dimension before the batch dimension? For example, if my original batch tensor is of the shape N x C x H x W and 2 GPUs are in use then after all_gather the tensor will be of the shape 2 x M x C x H x W (where 2M = N)? What happens if the batch size (N) is an odd number?

just in case someone sees this issue, discussion has been moved here: Correct approach to calculate metrics in DDP setting · Discussion #12602 · PyTorchLightning/pytorch-lightning · GitHub