Do datamodules have on_checkpoint_load/save hooks?

ananthsub · September 16, 2020, 2:33pm

We’ve built a dataloader which has a checkpointable state we can resume from. I’d like to include this in my lightning module checkpoint. It doesn’t seem like the data module has these hooks for me to add this though

cc @nate

teddy · September 16, 2020, 5:42pm

Currently not; we are working to implement this at the moment. In the mean time you can actually define train_dataloader(), val_dataloader() within your LightningModule itself, and anything that would be saved to its state_dict will be saved with the checkpoint.

ananthsub · September 16, 2020, 5:45pm

Is there a github issue/PR i can subscribe to for this?

goku · September 16, 2020, 5:51pm

saving state is I think specific to Callbacks. Maybe you can try creating a custom callback and overload on_save_checkpoint and on_load_checkpoint to make this work. You can check ModelCheckpoint.on_save_checkpoint for reference.

ananthsub · September 16, 2020, 5:54pm

The LightningModule also has these hooks to augment the checkpoint with more information: https://pytorch-lightning.readthedocs.io/en/latest/lightning_module.html#checkpoint-hooks

nate · September 16, 2020, 5:58pm

@ananthsub what do you envision a feature like this looking like? There was an issue about it here, but I haven’t seen/come up with a proposed solution that makes sense yet.

I’ve been workin on a solution for argument parsing that’ll let you get access to all the args/kwargs used to init model/dm/trainer which I’ll push this week…I was planning on using that to solve this problem, but then it would force the user to use my new object LightningArgumentParser (which I feel is a bad thing to enforce).

Can we discuss a solution here on what this checkpointing feature will concretely look like?

ananthsub · September 16, 2020, 5:59pm

Yes I’ll ping you on slack. I don’t want to be forced to use LightningArgumentParser for this. That seems to go against Lightning principles of being config-agnostic

nate · September 16, 2020, 6:07pm

totally agree. they need to be completely separate