I am using `EvalResult`

to report metrics like an accuarcy. In addition to that I look at the examples in my batch and group the results by the category the examples belong to and also calculate the accuracy for each grouped subset.

As an example, if my output is `out=[0, 1, 1, 0]`

and if my ground truth is `gt=[1, 1, 1, 1]`

my accuracy across the batch is 50%. If my category vector is `cat=[A, B, B, A]`

my accuracy for category `A`

would be 0% and for category `B`

would be 100%.

Sometimes I do not have examples in a batch of specific category (e.g. `cat=[A, A, A, A]`

) and thus there is no accuracy to compute (in this case for category `B`

). Therefore, I do not log it with `result.log("accuracy_groupX", ...)`

. I believe that is the best way to handle this instead of logging zero or maybe repeating the last value.

But if some of the logged values in the `EvalResult`

object do not have the same number of entries as the others, this crashes the `weighted_mean`

operation in line 366 of the `reduce_on_epoch_end`

function in `step_result.py`

.

This seems to be the case because the weight for the weighted mean (the `batch_sizes`

variable) assumes there is a datapoint to use in the weighted mean, but in reality there is none.

So if `len(result[k])`

is unequal to `(batch_sizes)`

, the `weighted_mean`

mean function will fail, crashing the entire training.

Is there a way to navigate around this bug or maybe even to fix it?