Metrics or Callbacks?


I am trying to write some callbacks to compute a few evaluation metrics and to store the predictions of my model. I have already come across the Metrics class but I want to keep the evaluation code separate from the model code so I was thinking of writing it as callbacks.

I need some help understanding why we need the Metrics class in the first place and why can’t I just use callbacks to compute results? I have written a callback for accuracy which can be found here but I have my doubts about whether it is correct or not? and under which circumstances would it not work? I have a hunch that there might be issues in distributed setups but does it work well in single accelerator environments?

My model returns the following dictionary as output of *_step functions.

return {'loss': loss, 'idx': idx, 'pred': pred, 'gt': gt}

Additionally, is it possible to use metric class in a callback? I am generally concerned about whether my approach with callbacks would work or not since I want to compute metrics besides those provided as a part of pytorch lightning. Any help would be appreciated.


I was wondering the same thing.
A related issue that I encounter is this: In callbacks I need the input and the output of a model to compute metrics. But in order to have them there, I need to return them from LightningModule.validation_step(). However, everything LightningModule.validation_step() returns is saved so my memory fills up pretty fast. Can’t we have the option NOT to save outputs?
Apologies if my replay shouldn’t have been posted here.