Can we train for a set number of iterations without setting max_epochs
? I now know there are max_steps
and limit_train_batches
to control how many training steps are taken overall and training steps per epoch, respectively. However, max_steps
runs into issues if max_steps * batch_size > len(train_dataset)
What would break if epochs
was an optional argument to the trainer? Is there an alternative, recommended way to convert training loops of this style over to lightning?
a super super hacky solution would be setting max_epochs
to be an outrageously large value and set max_steps
to my desired iteration count. then we’ll break from the train loop according to max_steps
since we’d hit this first instead of max_epochs
as suggested by @awaelchli
we could maybe the solutions is simple:
make both optional (default None)
if both are unset: set max_epochs=1000 (current)
if max_steps is set: use that one (keep max_epochs =None)
if both are set, stop based on whatever condition is met first.