Getting different values between `sklearn.metrics` and `torchmetrics`

Hi,

I’m using torchmetrics to calculate metrics for my model. However, I noticed that I get different answers between using torchmetrics and sklearn.metrics. Here is a small example:

preds = tensor([1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0])
targets = tensor([1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0])
acc = torchmetrics.Accuracy()
pre = torchmetrics.Precision(num_classes=2)
re = torchmetrics.Recall()
f1 = torchmetrics.F1()
print(acc(preds, targets))
print(re(preds, targets))
print(pre(preds, targets))
print(f1(preds, targets))

This prints:

tensor(0.6000)
tensor(0.6000)
tensor(0.6000)
tensor(0.6000)

While this:

print(accuracy_score(targets, preds))
print(recall_score(targets, preds))
print(precision_score(targets, preds))
print(f1_score(targets, preds))

prints:

0.6
0.5555555555555556
0.7142857142857143
0.6250000000000001

Except accuracy, every other metric is different. For torchmetrics, it spits out the same value for all metrics.

hey @sudarshan85

in sklearn the default average reduction is binary whereas in torchmetrics, it is micro

in torchmetrics you can do it like:

print(torchmetrics.Recall(average='none', num_classes=2)(preds_pt, targets_pt)[1])
print(torchmetrics.Precision(average='none', num_classes=2)(preds_pt, targets_pt)[1])
print(torchmetrics.F1Score(average='none', num_classes=2)(preds_pt, targets_pt)[1])

with none, it computes the individual metrics for each class, which is [0, 1] here, and just like sklearn pick the positive class.

Also, we have moved the discussions to GitHub Discussions. You might want to check that out instead to get a quick response. The forums will be marked read-only soon.

Thank you