Computing model output over dataset

Is it possible to create a script backed by a Trainer object for computing the model output on all items in a Dataset object?

Imagine I have a CNN trained for image classification and I want to save the logits over each example in the validation set.

Would this work? If you gather all your predicted probabilities and targets in your validation step, you could override the epoch end method for validation to aggregate them and save them to file.

you can try EvalResult.write

Hi @goku
After the update to V1.0, the Result object is deprecated. What is the new way to write result to disk?

self.write_prediction in LightningModule. It’s not documented yet.

1 Like

I couldn’t find the code of the write_prediction method.

  1. Assuming that it dumps the data onto disk at every call without using asynchronous process, isn’t it inefficient?

I’m using the code of @will, and I found a hack to get my job done. However, I was wondering if I want to accumulate predictions

  1. Can I gather them into an attribute of the Trainer instance, i.e. self? Any caveats with that approach namely as part of a distributed setup?

code for write_prediction: lightning/lightning.py at 71d5cc11f13c1338fbe3f74a8d12e438cc6fddef · Lightning-AI/lightning · GitHub

it won’t dump at every call, it will gather it in _step methods and saves it in _epoch_end.

BTW there is an open PR regarding prediction you might want to check out: https://github.com/PyTorchLightning/pytorch-lightning/pull/5468

1 Like

Thanks. That’s very informative.

  • Good (sensible) choice. Regarding dumping on _epoch_end

  • Do you know if that’s supported in the 0.9 version?
    I’m using a code written with the version of Lightning :blush:

not sure… internal API has changed a lot since then. But I guess you don’t need to change too much of your code to make it compatible with the current version.

Thanks, so far it works OK.

Do you know how hard is to upgrade to >= 1.0.0?
I’m new with pl any tips would be relevant.

not too hard… it’s just more simplified now, mostly the logging section. you can check the examples in the repo. In case you need help, you can always ask questions on forums or on slack.