Irregular use of result.log(...) crashes reduce_on_epoch_end

I am using EvalResult to report metrics like an accuarcy. In addition to that I look at the examples in my batch and group the results by the category the examples belong to and also calculate the accuracy for each grouped subset.

As an example, if my output is out=[0, 1, 1, 0] and if my ground truth is gt=[1, 1, 1, 1] my accuracy across the batch is 50%. If my category vector is cat=[A, B, B, A] my accuracy for category A would be 0% and for category B would be 100%.

Sometimes I do not have examples in a batch of specific category (e.g. cat=[A, A, A, A]) and thus there is no accuracy to compute (in this case for category B). Therefore, I do not log it with result.log("accuracy_groupX", ...). I believe that is the best way to handle this instead of logging zero or maybe repeating the last value.

But if some of the logged values in the EvalResult object do not have the same number of entries as the others, this crashes the weighted_mean operation in line 366 of the reduce_on_epoch_end function in step_result.py.
This seems to be the case because the weight for the weighted mean (the batch_sizes variable) assumes there is a datapoint to use in the weighted mean, but in reality there is none.
So if len(result[k]) is unequal to (batch_sizes), the weighted_mean mean function will fail, crashing the entire training.

Is there a way to navigate around this bug or maybe even to fix it?

Hello, my apology for the late reply. We are slowly converging to deprecate this forum in favor of the GH build-in version… Could we kindly ask you to recreate your question there - Lightning Discussions