Save checkpoints after specific number of steps instead of epochs

my trainer looks like this

trainer = pl.Trainer(gpus=gpus,max_steps=25000,precision=16)
trainer.fit(model,train_dl)

I want to save model checkpoint after each 5000 steps (they can overwrite). Is it possible to do that?
According to documentation checkpoint can be saved using modelcheckpoint callback after specific number of epochs, but I didn’t see anything mentioned there about saving after specific number of steps. I am not passing any val data , so I do not want to save based on val loss values either.
Is there any way to do this?
Thanks.

you can try this: Save checkpoint and validate every n steps · Issue #2534 · Lightning-AI/lightning · GitHub

1 Like

Thanks that worked for me