How to keep lr fixed at first N epoch, and then use cosineAnnealingLR in the rest of training
|
|
0
|
193
|
September 25, 2023
|
LR Finder MNIST
|
|
2
|
669
|
September 18, 2023
|
Reloading model with trainer.fit(ckpt_path) and overrides callback
|
|
0
|
276
|
August 14, 2023
|
Method `on_train_batch_end` of `LightningModule` happens after callbacks `on_train_batch_end` - is this configurable?
|
|
0
|
240
|
August 9, 2023
|
ModelCheckpoint and EarlyStopping don't seem to work?
|
|
0
|
290
|
August 6, 2023
|
'tuple' object has no attribute 'trainer'
|
|
2
|
516
|
August 2, 2023
|
How to resume training
|
|
9
|
38026
|
July 31, 2023
|
RuntimeError: Early stopping conditioned on metric `val_loss` which is not available
|
|
1
|
349
|
July 24, 2023
|
How do I convert different LightningModules?
|
|
3
|
227
|
July 18, 2023
|
Is it possible to use a single Trainer to train multiple versions of the same model in parallel?
|
|
0
|
182
|
July 17, 2023
|
Clarification on log_every_n_steps with accumulate_grad_batches
|
|
1
|
302
|
July 16, 2023
|
How do I continue training the model ?
|
|
2
|
495
|
July 6, 2023
|
KeyError: 'No action for destination key "trainer.devices" to set its default.'
|
|
1
|
1064
|
July 4, 2023
|
Limit steps per epoch
|
|
10
|
1854
|
July 4, 2023
|
How to suppress trainer from printing directly to console?
|
|
1
|
409
|
June 6, 2023
|
Training stuck on resume
|
|
1
|
825
|
May 31, 2023
|
Confusing # of optimizer steps when using gradient accumulation with DeepSpeed
|
|
0
|
610
|
May 25, 2023
|
Training when data is stored in batches
|
|
2
|
225
|
May 21, 2023
|
Trainer prints every step in validation
|
|
2
|
1510
|
May 17, 2023
|
Weird result in convolutional network
|
|
2
|
462
|
May 14, 2023
|
Retraining a model with new data
|
|
1
|
298
|
May 9, 2023
|
How to use SWA with a cyclic scheduler
|
|
0
|
378
|
May 7, 2023
|
Resume training / load module from DeepSpeed checkpoint
|
|
14
|
3311
|
May 6, 2023
|
Resuming training gives different model result / weights
|
|
0
|
701
|
May 4, 2023
|
Wonder if _update_learning_rates is properly implemented
|
|
0
|
146
|
April 19, 2023
|
Why is the Trainer instance saved inside the DataModule during checkpoint save?
|
|
2
|
315
|
April 11, 2023
|
Trainer.validate/test with ckpt_path does not resume global_step
|
|
3
|
202
|
April 7, 2023
|
Is gradient clipping done before or after gradients accumulation?
|
|
2
|
669
|
April 5, 2023
|
Multiple dataloaders and epoch calculation
|
|
0
|
155
|
April 1, 2023
|
How does `LightningOptimizer.zero_grad()` work?
|
|
2
|
205
|
March 31, 2023
|