Different behavior for model checkpoints if last or best
|
|
0
|
165
|
July 25, 2023
|
Getting element 0 error while fine tuning llm
|
|
3
|
485
|
July 17, 2023
|
Combining loss, predictions in multi gpus
|
|
3
|
945
|
July 9, 2023
|
How to save new lr hyperparameter after using LRFinder when using wandb
|
|
2
|
446
|
July 10, 2023
|
Any example to launch multiple nodes distributed training with deepspeed strategy?
|
|
2
|
1731
|
June 28, 2023
|
How to use textbooks for fine-tuning LLM
|
|
0
|
406
|
June 24, 2023
|
Data collate_fn makes training process super slow!
|
|
0
|
995
|
June 22, 2023
|
Using SequentialLR with Step, Epoch and ReduceLROnPlateau
|
|
0
|
547
|
June 2, 2023
|
Finetuning using lit-llama
|
|
3
|
529
|
May 24, 2023
|
Transfer learning
|
|
0
|
209
|
May 23, 2023
|
I am lost on custom batch size definition
|
|
2
|
532
|
May 17, 2023
|
Problem that many symbols are output in val_dataloaders
|
|
2
|
383
|
May 6, 2023
|
Error when predicting from checkpoint
|
|
1
|
754
|
May 6, 2023
|
Does not run validation step after epoch when running with all data
|
|
5
|
1725
|
May 1, 2023
|
Why are my training and validation losses only changing by very little?
|
|
2
|
826
|
April 28, 2023
|
Saving checkpoints and logging models
|
|
1
|
206
|
April 28, 2023
|
Different ways of logging model
|
|
0
|
137
|
April 26, 2023
|
How can we skip a step with NaN loss in the training_step when using Distributed Data Parallel (DDP)?
|
|
1
|
1141
|
April 24, 2023
|
Mac M2 MPS: failed assertion `destination kernel width and filter kernel width mismatch'
|
|
0
|
582
|
April 17, 2023
|
Error on trainer = L.Trainer(max_epochs=2000)
|
|
0
|
283
|
April 4, 2023
|
Custom training - RuntimeError due to unused parameters
|
|
0
|
1556
|
April 3, 2023
|
MLFlowLogger always generates the same run name
|
|
1
|
515
|
April 3, 2023
|
LR Scheduler monitoring multiple metrics
|
|
2
|
662
|
April 3, 2023
|
RAM usage increases quickly over the training step
|
|
2
|
348
|
March 30, 2023
|
Code structuring for text classification with hf bert-uncase
|
|
2
|
404
|
March 23, 2023
|
Use two datasets and distinguish during training
|
|
0
|
146
|
March 22, 2023
|
DeepSpeed: how to execute certain code once?
|
|
0
|
251
|
March 22, 2023
|
How to combine PTL arguments with ArgumentParser
|
|
2
|
1858
|
March 22, 2023
|
Multi GPU - Autolog with multiple runs - lightning2.0
|
|
2
|
686
|
March 22, 2023
|
Loadind saved checkpoint model.model
|
|
2
|
330
|
March 16, 2023
|