Why might speed stay the same when moving from 1 GPU to 8 GPUs (DDP)?

sm000 · September 3, 2020, 5:47pm

I’m not seeing any speed increase when increasing the number of GPUs (from 1 to 8) and switching Lightning’s distributed backend to DDP, even sometimes getting slower. Any ideas why this might be the case in general? I have num_workers in my DataModules/dataloaders set to 32 and pin_memory True.

Anything I can do in Lightning to diagnose/fix this? (I’m aware of the profiler but not sure how I can make it helpful here.)

teddy · September 4, 2020, 3:08pm

Theoretically you should see a ~8x speed increase, as the training data is being split among the gpus. Do you think you could share any code so we can help debug the issue?

sm000 · September 6, 2020, 7:14pm

(I’m trying to put together a script I can share reproducing this and will update once I do, thanks for the response.)