I’m curious how others deal correctly using
pl.seed_everything() in ddp mode. By doing this, transforms are reapplied the exact same way across samples in the same batch. This is a pretty large reduction in randomness. The alternative is to not seed or seed per RANK, but you can run into some pretty bad bugs when doing things like randomly splitting, and it makes reproducibility harder. Changing the seed over and over again during training is also not a solution as this is very bad for randomness as well.
Im not sure the correct way of handling this and would love to hear others thoughts.