Issue with nonzero num_workers

mlnow · January 12, 2022, 4:40pm

When I use num_workers =0 for train_dataloader, val_dataloader, test_dataloader, the training finishes one epoch %100 quickly (although I get loss = NaN and I have not figure out what the issue is) with some warning that I should use larger num_workers and it suggests me to use num_workers = 16.

However, if I use num_workers > 0 it gets stuck at the validation sanity check and it does not go anywhere.

Can someone please shed some light on what the issue might be ? Thank you.

goku · January 29, 2022, 10:24pm

hey @mlnow

Can you share the trainer configuration?

Also, we have moved the discussions to GitHub Discussions. You might want to check that out instead to get a quick response. The forums will be marked read-only soon.

Thank you