DDP/GPU

Topic	Replies	Views	Activity
About the DDP/GPU category	0	919	August 26, 2020
Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!	0	351	February 6, 2024
I have problem with getting the test outputs been printed for each gpu device? How can I collect this one across different gpus	0	26	April 9, 2024
`self.all_gather` used in `on_training_epoch_end` reports `RuntimeError`	0	71	March 21, 2024
Distributed Initialization	0	58	March 13, 2024
Multi-GPU Training Error: ProcessExitedException: process 0 terminated with signal SIGSEGV	7	2719	March 4, 2024
How to keep track of training time in DDP setting?	6	1043	February 29, 2024
torch.cuda.OutOfMemoryError: CUDA out of memory with mixed precision	2	154	February 24, 2024
Training freezes at "initializing ddp: GLOBAL_RANK ..."	3	1487	February 17, 2024
How to use DDP in LightningModule in Apple M1?	9	322	February 16, 2024
Multiple GPU runs the scipt twice	10	114	February 8, 2024
Reproduce one GPU score/loss using DDP - Disrepancy	1	139	January 28, 2024
Does PyTorch Lightning support Torch Elastic in FSDP	1	145	January 21, 2024
RuntimeError: Parameters that were not used in producing the loss returned by training_step	0	551	January 13, 2024
ChatGPT Despite scaling up batch size and nodes using PyTorch Lightning and DDP, there's no speedup in training	0	155	January 12, 2024
Behaviour of dropout over multiple gpu setting	4	190	December 18, 2023
Get the indices of Dataloader for multi-gpu training	0	276	December 1, 2023
DDP strategy only uses the first GPU	2	836	November 22, 2023
How to move data to the cuda in customized datacollator in DDP mode	0	170	November 13, 2023
DDP MultiGPU Training does not reduce training time	3	1172	November 8, 2023
Ignore log in one of the GPUs as it does not have a specific loss	2	206	October 24, 2023
How to not load complete in-memory dataset for every process in DDP training	2	3546	October 17, 2023
Error with ddp when updating from pytorch-lightning 1.6.5 to version2.0.9	0	769	October 4, 2023
Multi-task model in version 2.0.9 with DDP error	0	579	October 4, 2023
Error with Pytorch Lightning ddp_spawn on SLURM	0	973	October 1, 2023
Logging metrics when training with "ddp_spawn"	1	464	September 29, 2023
Is anyone konw why this code will stuck on epoch 3 using DDP	0	187	September 24, 2023
Is anyone know why is code using ddp will be stucked on the epoch 3	0	150	September 24, 2023
How to properly use multiple trainers with ddp in one script?	1	185	September 10, 2023
DDP Error in a Hyperparameter Optimisation run	0	579	September 6, 2023